Chapter 11

Chapter 11. Client-Side Scripting

11.1 "Socket to Me!"

The previous chapter introduced Internet fundamentals and explored sockets -- the underlying communications mechanism over which bytes flow on the Net. In this chapter, we climb the encapsulation hierarchy one level, and shift our focus to Python tools that support the client-side interfaces of common Internet protocols.

We talked about the Internet's higher-level protocols in the abstract at the start of the last chapter, and you should probably review that material if you skipped over it the first time around. In short, protocols define the structure of the conversations that take place to accomplish most of the Internet tasks we're all familiar with -- reading email, transferring files by FTP, fetching web pages, and so on.

At the most basic level, all these protocol dialogs happen over sockets using fixed and standard message structures and ports, so in some sense this chapter builds upon the last. But as we'll see, Python's protocol modules hide most of the underlying details -- scripts generally need deal only with simple objects and methods, and Python automates the socket and messaging logic required by the protocol.

In this chapter, we'll concentrate on the FTP and email protocol modules in Python, and peek at a few others along the way (NNTP news, HTTP web pages, and so on). All of the tools employed in examples here are in the standard Python library and come with the Python system. All of the examples here are also designed to run on the client side of a network connection -- these scripts connect to an already-running server to request interaction and can be run from a simple PC. In the next chapter, we'll move on to explore scripts designed to be run on the server side instead. For now, let's focus on the client.

11.2 Transferring Files over the Net

As we saw in the previous chapter, sockets see plenty of action on the Net. For instance, the getfile example at the end of that chapter allowed us to transfer entire files between machines. In practice, though, higher-level protocols are behind much of what happens on the Net. Protocols run on top of sockets, but hide much of the complexity of the network scripting examples we've just seen.

FTP -- the File Transfer Protocol -- is one of the more commonly used Internet protocols. It defines a higher-level conversation model that is based on exchanging command strings and file contents over sockets. By using FTP, we can accomplish the same task as the prior chapter's getfile script, but the interface is simpler, and standard -- FTP lets us ask for files from any server machine that supports FTP, without requiring that it run our custom getfile script. FTP also supports more advanced operations such as uploading files to the server, getting remote directory listings, and more.

Really, FTP runs on top of two sockets: one for passing control commands between client and server (port 21), and another for transferring bytes. By using a two-socket model, FTP avoids the possibility of deadlocks (i.e., transfers on the data socket do not block dialogs on the control socket). Ultimately, though, Python's ftplib support module allows us to upload and download files at a remote server machine by FTP, without dealing in raw socket calls or FTP protocol details.

11.2.1 FTP: Fetching Python with Python

Because the Python FTP interface is so easy to use, let's jump right into a realistic example. The script in Example 11-1 automatically fetches and builds Python with Python. No, this isn't a recursive chicken-and-egg thought exercise -- you must already have installed Python to run this program. More specifically, this Python script does the following:

Downloads the Python source distribution by FTP
Unpacks and compiles the distribution into a Python executable

The download portion will run on any machine with Python and sockets; the unpacking and compiling code assumes a Unix-like build environment as coded here, but could be tweaked to work with other platforms.

Example 11-1. PP2E\Internet\Ftp\getpython.py

#!/usr/local/bin/python
###############################################################
# A Python script to download and build Python's source code.
# Uses ftplib, the ftp protocol handler which uses sockets.
# Ftp runs on 2 sockets (one for data, one for control--on
# ports 20 and 21) and imposes message text formats, but the 
# Python ftplib module hides most of this protocol's details.
###############################################################

import os
from ftplib import FTP                   # socket-based ftp tools
Version = '1.5'                          # version to download
tarname = 'python%s.tar.gz' % Version    # remote/local file name
 
print 'Connecting...'
localfile  = open(tarname, 'wb')         # where to store download
connection = FTP('ftp.python.org')       # connect to ftp site
connection.login()                       # default is anonymous login
connection.cwd('pub/python/src')         # xfer 1k at a time to localfile

print 'Downloading...'
connection.retrbinary('RETR ' + tarname, localfile.write, 1024)
connection.quit()
localfile.close()
 
print 'Unpacking...'
os.system('gzip -d '  + tarname)         # decompress
os.system('tar -xvf ' + tarname[:-3])    # strip .gz 

print 'Building...'
os.chdir('Python-' + Version)            # build Python itself
os.system('./configure')                 # assumes unix-style make
os.system('make')
os.system('make test')
print 'Done: see Python-%s/python.' % Version

Most of the FTP protocol details are encapsulated by the Python ftplib module imported here. This script uses some of the simplest interfaces in ftplib (we'll see others in a moment), but they are representative of the module in general:

connection = FTP('ftp.python.org')       # connect to ftp site

To open a connection to a remote (or local) FTP server, create an instance of the ftplib.FTP object, passing in the name (domain or IP-style) of the machine you wish to connect to. Assuming this call doesn't throw an exception, the resulting FTP object exports methods that correspond to the usual FTP operations. In fact, Python scripts act much like typical FTP client programs -- just replace commands you would normally type or select with method calls:

connection.login()                       # default is anonymous login
connection.cwd('pub/python/src')         # xfer 1k at a time to localfile

Once connected, we log in, and go to the remote directory we want to fetch a file from. The login method allows us to pass in additional optional arguments to specify a username and password; by default it performs anonymous FTP:

connection.retrbinary('RETR ' + tarname, localfile.write, 1024)
connection.quit()

Once we're in the target directory, we simply call the retrbinary method to download the target server file in binary mode. The retrbinary call will take awhile to complete, since it must download a big file. It gets three arguments:

An FTP command string -- here, a string RETR filename, which is the standard format for FTP retrievals.
A function or method to which Python passes each chunk of the downloaded file's bytes -- here, the write method of a newly created and opened local file.
A size for those chunks of bytes -- here, 1024 bytes are downloaded at a time, but the default is reasonable if this argument is omitted.

Because this script creates a local file named localfile, of the same name as the remote file being fetched, and passes its write method to the FTP retrieval method, the remote file's contents will automatically appear in a local, client-side file after the download is finished. By the way, notice that this file is opened in "wb" binary output mode; if this script is run on Windows, we want to avoid automatically expanding and \n bytes into \r\n byte sequences (that happens automatically on Windows when writing files opened in "w" text mode).

Finally, we call the FTP quit method to break the connection with the server and manually close the local file to force it to be complete before it is further processed by the shell commands spawned by os.system (it's not impossible that parts of the file are still held in buffers before the close call):

connection.quit()
localfile.close()

And that's all there is to it; all the FTP, socket, and networking details are hidden behind the ftplib interface module. Here is this script in action on a Linux machine, with a couple thousand output lines cut in the interest of brevity:

[lutz@starship test]$ python getpython.py 
Connecting...
Downloading...
Unpacking...
Python-1.5/
Python-1.5/Doc/
Python-1.5/Doc/ref/
Python-1.5/Doc/ref/.cvsignore
Python-1.5/Doc/ref/fixps.py
...
 ...lots of tar lines deleted...
...
Python-1.5/Tools/webchecker/webchecker.py
Python-1.5/Tools/webchecker/websucker.py
Building...
creating cache ./config.cache
checking MACHDEP... linux2
checking CCC...
checking for --without-gcc... no
checking for gcc... gcc
...
 ...lots of build lines deleted...
...
Done: see Python-1.5/python.

[lutz@starship test]$ cd Python-1.5/ 
[lutz@starship Python-1.5]$ ./python 
Python 1.5 (#1, Jul 12 2000, 12:35:52)  [GCC egcs-2.91.66 19990314/Li on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> print 'The Larch!'
The Larch!

Such a script could be automatically executed at regular intervals (e.g., by a Unix cron job) to update a local Python install with a fresh build. But the thing to notice here is that this otherwise typical Python script fetches information from an arbitrarily remote FTP site and machine. Given an Internet link, any information published by an FTP server on the Net can be fetched by and incorporated into Python scripts using interfaces such as these.

11.2.1.1 Using urllib to FTP files

In fact, FTP is just one way to transfer information across the Net, and there are more general tools in the Python library to accomplish the prior script's download. Perhaps the most straightforward is the Python urllib module: given an Internet address string -- a URL, or Universal Resource Locator -- this module opens a connection to the specified server and returns a file-like object ready to be read with normal file object method calls (e.g., read, readlines).

We can use such a higher-level interface to download anything with an address on the Web -- files published by FTP sites (using URLs that start with "ftp://"), web pages and outputs of scripts that live on remote servers (using "http://" URLs), local files (using "file://" URLs), Gopher server data, and more. For instance, the script in Example 11-2 does the same as the one in Example 11-1, but it uses the general urllib module to fetch the source distribution file, instead of the protocol-specific ftplib.

Example 11-2. PP2E\Internet\Ftp\getpython-urllib.py

#!/usr/local/bin/python
###################################################################
# A Python script to download and build Python's source code
# use higher-level urllib instead of ftplib to fetch file
# urllib supports ftp, http, and gopher protocols, and local files
# urllib also allows downloads of html pages, images, text, etc.;
# see also Python html/xml parsers for web pages fetched by urllib;
###################################################################

import os
import urllib                           # socket-based web tools
Version = '1.5'                         # version to download
tarname = 'python%s.tar.gz' % Version   # remote/local file name
 
remoteaddr = 'ftp://ftp.python.org/pub/python/src/' + tarname
print 'Downloading', remoteaddr

# this works too:
# urllib.urlretrieve(remoteaddr, tarname)

remotefile = urllib.urlopen(remoteaddr)     # returns input file-like object
localfile  = open(tarname, 'wb')            # where to store data locally
localfile.write(remotefile.read())
localfile.close()
remotefile.close()
 
# the rest is the same
execfile('buildPython.py')

Don't sweat the details of the URL string used here; we'll talk much more about URLs in the next chapter. We'll also use urllib again in this and later chapters to fetch web pages, format generated URL strings, and get the output of remote scripts on the Web.^[1] Technically speaking, urllib supports a variety of Internet protocols (HTTP, FTP, Gopher, and local files), is only used for reading remote objects (not writing or uploading them), and retrievals must generally be run in threads if blocking is a concern. But the basic interface shown in this script is straightforward. The call:

remotefile = urllib.urlopen(remoteaddr)     # returns input file-like object

contacts the server named in the remoteaddr URL string and returns a file-like object connected to its download stream (an FTP-based socket). Calling this file's read method pulls down the file's contents, which are written to a local client-side file. An even simpler interface:

urllib.urlretrieve(remoteaddr, tarname)

also does the work of opening a local file and writing the downloaded bytes into it -- things we do manually in the script as coded. This comes in handy if we mean to download a file, but is less useful if we want to process its data immediately.

Either way, the end result is the same: the desired server file shows up on the client machine. The remainder of the script -- unpacking and building -- is identical to the original version, so it's been moved to a reusable Python file run with the execfile built-in (recall that execfile runs a file as though its code were pasted into the place where the execfile appears). The script is shown in Example 11-3.

Example 11-3. PP2E\Internet\Ftp\buildPython.py

#!/usr/local/bin/python
###############################################################
# A Python script to build Python from its source code.
# Run me in directory where Python source distribution lives.
###############################################################

import os
Version = '1.5'                         # version to build
tarname = 'python%s.tar.gz' % Version   # remote/local file name
 
print 'Unpacking...'
os.system('gzip -d '  + tarname)        # decompress file
os.system('tar -xvf ' + tarname[:-3])   # untar without '.gz'

print 'Building...'
os.chdir('Python-' + Version)           # build Python itself
os.system('./configure')                # assumes unix-style make
os.system('make')
os.system('make test')
print 'Done: see Python-%s/python.' % Version

The output this time is almost identical to the output of Example 11-1, so I'll show only a few portions (the gzip message appears if you don't delete a tar file left by a run in the past):

[lutz@starship test]$ python getpython-urllib.py 
Downloading ftp://ftp.python.org/pub/python/src/python1.5.tar.gz
Unpacking...
gzip: python1.5.tar already exists; do you wish to overwrite (y or n)? y
 ...tar lines... 
Building...
 ...build lines...
Done: see Python-1.5/python.

[lutz@starship test]$ python buildPython.py 
Unpacking...
 ...tar and build lines...

In fact, although the original script is all top-level code that runs immediately and accomplishes only one task, there really are two potentially reusable activities within it: fetching a file and building Python from source. By splitting each part off into a module of its own, we can reuse its program logic in other contexts, which naturally leads us to the topic in the next section.

11.2.2 FTP get and put Utilities

Almost invariably, when I present the ftplib interfaces in Python classes, students ask why programmers need to supply the RETR string in the retrieval method. It's a good question -- the RETR string is the name of the download command in the FTP protocol, but ftplib is supposed to encapsulate that protocol. As we'll see in a moment, we have to supply an arguably odd STOR string for uploads as well. It's boilerplate code that you accept on faith once you see it, but that begs the question. You could always email Guido a proposed ftplib patch, but that's not really a good answer for beginning Python students.^[2]

A better answer is that there is no law against extending the standard library modules with higher-level interfaces of our own -- with just a few lines of reusable code, we can make the FTP interface look any way we want in Python. For instance, we could, once and for all, write utility modules that wrap the ftplib interfaces to hide the RETR string. If we place these utility modules in a directory on PYTHONPATH, they become just as accessible as ftplib itself, automatically reusable in any Python script we write in the future. Besides removing the RETR string requirement, a wrapper module could also make assumptions that simplify FTP operations into single function calls.

For instance, given a module that encapsulates and simplifies ftplib, our Python fetch-and-build script could be further reduced to the script shown in Example 11-4 -- essentially just a function call and file execution.

Example 11-4. PP2E\Internet\Ftp\getpython-modular.py

#!/usr/local/bin/python
################################################################
# A Python script to download and build Python's source code.
# Uses getfile.py, a utility module which encapsulates ftp step.
################################################################

import getfile
Version = '1.5'                         # version to download
tarname = 'python%s.tar.gz' % Version   # remote/local file name

# fetch with utility 
getfile.getfile(tarname, 'ftp.python.org', 'pub/python/src')

# rest is the same
execfile('buildPython.py')

Besides having a line count that is much more impressive to marketeers, the meat of this script has been split off into files for reuse elsewhere. If you ever need to download a file again, simply import an existing function rather than copying code with cut-and-paste editing. Changes in download operations would need to be made in only one file, not everywhere we've copied boilerplate code; getfile.getfile could even be changed to use urllib instead of ftplib without effecting any of its clients. It's good engineering.

11.2.2.1 Download utility

So just how would we go about writing such an FTP interface wrapper (he asks, knowingly)? Given the ftplib library module, wrapping downloads of a particular file in a particular directory is straightforward. Connected FTP objects support two download methods:

The retrbinary method downloads the requested file in binary mode, sending its bytes in chunks to a supplied function, without line-feed mapping. Typically, the supplied function is a write method of an open local file object, such that the bytes are placed in the local file on the client.
The retrlines method downloads the requested file in ASCII text mode, sending each line of text to a supplied function with all end-of-line characters stripped. Typically, the supplied function adds a \n newline (mapped appropriately for the client machine), and writes the line to a local file.

We will meet the retrlines method in a later example; the getfile utility module in Example 11-5 transfers in binary mode always with retrbinary. That is, files are downloaded exactly as they were on the server, byte for byte, with the server's line-feed conventions in text files. You may need to convert line-feeds after downloads if they look odd in your text editor -- see the converter tools in Chapter 5, for pointers.

Example 11-5. PP2E\Internet\Ftp\getfile.py

#!/usr/local/bin/python
################################################# 
# Fetch an arbitrary file by ftp.  Anonymous 
# ftp unless you pass a user=(name, pswd) tuple.
# Gets the Monty Python theme song by default.
#################################################
 
from ftplib  import FTP          # socket-based ftp tools
from os.path import exists       # file existence test

file = 'sousa.au'                # default file coordinates
site = 'ftp.python.org'          # monty python theme song
dir  = 'pub/python/misc'

def getfile(file=file, site=site, dir=dir, user=(), verbose=1, force=0):
    """
    fetch a file by ftp from a site/directory
    anonymous or real login, binary transfer
    """
    if exists(file) and not force:
        if verbose: print file, 'already fetched'
    else:
        if verbose: print 'Downloading', file
        local = open(file, 'wb')                # local file of same name
        try:
            remote = FTP(site)                  # connect to ftp site
            apply(remote.login, user)           # anonymous=() or (name, pswd)
            remote.cwd(dir)
            remote.retrbinary('RETR ' + file, local.write, 1024)
            remote.quit()
        finally:
            local.close()                       # close file no matter what
        if verbose: print 'Download done.'      # caller handles exceptions

if __name__ == '__main__': getfile()            # anonymous python.org login

This module is mostly just a repackaging of the FTP code we used to fetch the Python source distribution earlier, to make it simpler and reusable. Because it is a callable function, the exported getfile.getfile here tries to be as robust and generally useful as possible, but even a function this small implies some design decisions. Here are a few usage notes:

FTP mode: The getfile function in this script runs in anonymous FTP mode by default, but a two-item tuple containing a username and password string may be passed to the user argument to log in to the remote server in non-anonymous mode. To use anonymous FTP, either don't pass the user argument or pass it an empty tuple, (). The FTP object login method allows two optional arguments to denote a username and password, and the apply call in Example 11-5 sends it whatever argument tuple you pass to user.
Processing modes: If passed, the last two arguments (verbose, force) allow us to turn off status messages printed to the stdout stream (perhaps undesirable in a GUI context) and force downloads to happen even if the file already exists locally (the download overwrites the existing local file).
Exception protocol: The caller is expected to handle exceptions; this function wraps downloads in a try/finally statement to guarantee that the local output file is closed, but lets exceptions propagate. If used in a GUI or run from a thread, for instance, exceptions may require special handling unknown in this file.
Self-test: If run standalone, this file downloads a sousa.au audio file from http://www.python.org as a self-test, but the function will normally be passed FTP filenames, site names, and directory names as well.
File mode: This script is careful to open the local output file in "wb" binary mode to suppress end-line mapping, in case it is run on Windows. As we learned in Chapter 2, it's not impossible that true binary data files may have bytes whose value is equal to a \n line-feed character; opening in "w" text mode instead would make these bytes be automatically expanded to a \r\n two-byte sequence when written locally on Windows. This is only an issue for portability to Windows (mode "w" works elsewhere). Again, see Chapter 5 for line-feed converter tools.
Directory model: This function currently uses the same filename to identify both the remote file and the local file where the download should be stored. As such, it should be run in the directory where you want the file to show up; use os.chdir to move to directories if needed. (We could instead assume filename is the local file's name, and strip the local directory with os.path.split to get the remote name, or accept two distinct filename arguments -- local and remote.)

Notice also that, despite its name, this module is very different than the getfile.py script we studied at the end of the sockets material in the previous chapter. The socket-based getfile implemented client and server-side logic to download a server file to a client machine over raw sockets.

This new getfile here is a client-side tool only. Instead of raw sockets, it uses the simpler FTP protocol to request a file from a server; all socket-level details are hidden in the ftplib module's implementation of the FTP client protocol. Furthermore, the server here is a perpetually running program on the server machine, which listens for and responds to FTP requests on a socket, on the dedicated FTP port (number 21). The net functional effect is that this script requires an FTP server to be running on the machine where the desired file lives, but such a server is much more likely to be available.

11.2.2.2 Upload utility

While we're at it, let's write a script to upload a single file by FTP to a remote machine. The upload interfaces in the FTP module are symmetric with the download interfaces. Given a connected FTP object:

Its storbinary method can be used to upload bytes from an open local file object.
Its storlines method can be used to upload text in ASCII mode from an open local file object.

Unlike the download interfaces, both of these methods are passed a file object as a whole, not a file object method (or other function). We will meet the storlines method in a later example. The utility module in Example 11-6 uses storbinary such that the file whose name is passed in is always uploaded verbatim -- in binary mode, without line-feed translations for the target machine's conventions. If this script uploads a text file, it will arrive exactly as stored on the machine it came from, client line-feed markers and all.

Example 11-6. PP2E\Internet\Ftp\putfile.py

#!/usr/local/bin/python
################################################## 
# Store an arbitrary file by ftp.  Anonymous 
# ftp unless you pass a user=(name, pswd) tuple.
##################################################
 
import ftplib                    # socket-based ftp tools

file = 'sousa.au'                # default file coordinates
site = 'starship.python.net'     # monty python theme song
dir  = 'upload'

def putfile(file=file, site=site, dir=dir, user=(), verbose=1):
    """
    store a file by ftp to a site/directory
    anonymous or real login, binary transfer
    """
    if verbose: print 'Uploading', file
    local  = open(file, 'rb')               # local file of same name
    remote = ftplib.FTP(site)               # connect to ftp site
    apply(remote.login, user)               # anonymous or real login
    remote.cwd(dir)
    remote.storbinary('STOR ' + file, local, 1024)
    remote.quit()
    local.close()
    if verbose: print 'Upload done.'

if __name__ == '__main__':
    import sys, getpass
    pswd = getpass.getpass(site + ' pswd?')          # filename on cmdline
    putfile(file=sys.argv[1], user=('lutz', pswd))   # non-anonymous login

Notice that for portability, the local file is opened in "rb" binary mode this time to suppress automatic line-feed character conversions in case this is run on Windows; if this is binary information, we don't want any bytes that happen to have the value of the \r carriage-return character to mysteriously go away during the transfer.

Also observe that the standard Python getpass.getpass is used to ask for an FTP password in standalone mode. Like the raw_input built-in function, this call prompts for and reads a line of text from the console user; unlike raw_input, getpass does not echo typed characters on the screen at all (in fact, on Windows it uses the low-level direct keyboard interface we met in the stream redirection section of Chapter 2). This comes in handy for protecting things like passwords from potentially prying eyes.

Like the download utility, this script uploads a local copy of an audio file by default as a self-test, but you will normally pass in real remote filename, site name, and directory name strings. Also like the download utility, you may pass a (username, password) tuple to the user argument to trigger non-anonymous FTP mode (anonymous FTP is the default).

11.2.2.3 Playing the Monty Python theme song

Wake up -- it's time for a bit of fun. Let's make use of these scripts to transfer and play the Monty Python theme song audio file maintained at Python's web site. First off, let's write a module that downloads and plays the sample file, as shown in Example 11-7.

Example 11-7. PP2E\Internet\Ftp\sousa.py

#!/usr/local/bin/python
################################################# 
# Usage: % sousa.py
# Fetch and play the Monty Python theme song.
# This may not work on your system as is: it 
# requires a machine with ftp access, and uses
# audio filters on Unix and your .au player on 
# Windows.  Configure playfile.py as needed.
#################################################
 
import os, sys
from PP2E.Internet.Ftp.getfile  import getfile
from PP2E.Internet.Ftp.playfile import playfile
sample = 'sousa.au'

getfile(sample)    # fetch audio file by ftp
playfile(sample)   # send it to audio player

This script will run on any machine with Python, an Internet link, and a recognizable audio player; it works on my Windows laptop with a dialup Internet connection (if I could insert an audio file hyperlink here to show what it sounds like, I would):

C:\...\PP2E\Internet\Ftp>python sousa.py
Downloading sousa.au
Download done.

C:\...\PP2E\Internet\Ftp>python sousa.py
sousa.au already fetched

The getfile and putfile modules can be used to move the sample file around, too. Both can either be imported by clients that wish to use their functions, or run as top-level programs to trigger self-tests. Let's run these scripts from a command line and the interactive prompt to see how they work. When run standalone, parameters are passed in the command line, and the default file settings are used:

C:\...\PP2E\Internet\Ftp>python putfile.py sousa.au
starship.python.net pswd?
Uploading sousa.au
Upload done.

When imported, parameters are passed explicitly to functions:

C:\...\PP2E\Internet\Ftp>python
>>> from getfile import getfile
>>> getfile(file='sousa.au', site='starship.python.net', dir='upload',
...                          user=('lutz', '****'))
Downloading sousa.au
Download done.
>>> from playfile import playfile
>>> playfile('sousa.au')

I've left one piece out of the puzzle: all that's left is to write a module that attempts to play an audio file portably (see Example 11-8). Alas, this is the least straightforward task because audio players vary per platform. On Windows, the following module uses the DOS start command to launch whatever you have registered to play audio files (exactly as if you had double-clicked on the file's icon in a file explorer); on the Windows 98 side of my Sony notebook machine, this DOS command line:

C:\...\PP2E\Internet\Ftp>python playfile.py sousa.au

pops up a media bar playing the sample. On Unix, it attempts to pass the audio file to a command-line player program, if one has been added to the unixfilter table -- tweak this for your system (cat 'ing audio files to /dev/audio works on some Unix systems, too). On other platforms, you'll need to do a bit more; there has been some work towards portable audio interfaces in Python, but it's notoriously platform-specific. Web browsers generally know how to play audio files, so passing the filename in a URL to a browser located via the LaunchBrowser.py script we met in Chapter 4, is perhaps a portable solution here as well (see that chapter for interface details).

Example 11-8. PP2E\Internet\Ftp\playfile.py

#!/usr/local/bin/python
################################################# 
# Try to play an arbitrary audio file.
# This may not work on your system as is; it
# uses audio filters on Unix, and filename 
# associations on Windows via the start command
# line (i.e., whatever you have on your machine 
# to run *.au files--an audio player, or perhaps
# a web browser); configure me as needed.  We
# could instead launch a web browser here, with 
# LaunchBrowser.py.  See also: Lib/audiodev.py.
#################################################

import os, sys
sample = 'sousa.au'  # default audio file

unixhelpmsg = """
Sorry: can't find an audio filter for your system!
Add an entry for your system to the "unixfilter" 
dictionary in playfile.py, or play the file manually.
"""

unixfilter = {'sunos5':  '/usr/bin/audioplay',
              'linux2':  '<unknown>',
              'sunos4':  '/usr/demo/SOUND/play'}

def playfile(sample=sample):
    """
    play an audio file: use name associations 
    on windows, filter command-lines elsewhere
    """
    if sys.platform[:3] == 'win':
        os.system('start ' + sample)   # runs your audio player
    else:
        if not (unixfilter.has_key(sys.platform) and 
                os.path.exists(unixfilter[sys.platform])):
            print unixhelpmsg
        else:
            theme = open(sample, 'r')             
            audio = os.popen(unixfilter[sys.platform], 'w')  # spawn shell tool
            audio.write(theme.read())                        # send to its stdin

if __name__ == '__main__': playfile()

11.2.2.4 Adding user interfaces

If you read the last chapter, you'll recall that it concluded with a quick look at scripts that added a user interface to a socket-based getfile script -- one that transferred files over a proprietary socket dialog, instead of FTP. At the end of that presentation, I mentioned that FTP is a much more generally useful way to move files around, because FTP servers are so widely available on the Net. For illustration purposes, Example 11-9 shows a simple mutation of the last chapter's user interface, implemented as a new subclass of the last chapter's general form builder.

Example 11-9. P2E\Internet\Ftp\getfilegui.py

###############################################################
# launch ftp getfile function with a reusable form gui class;
# uses os.chdir to goto target local dir (getfile currently
# assumes that filename has no local directory path prefix);
# runs getfile.getfile in thread to allow more than one to be 
# running at once and avoid blocking gui during downloads;
# this differs from socket-based getfilegui, but reuses Form;
# supports both user and anonymous ftp as currently coded;
# caveats: the password field is not displayed as stars here,
# errors are printed to the console instead of shown in the 
# gui (threads can't touch the gui on Windows), this isn't 
# 100% thread safe (there is a slight delay between os.chdir
# here and opening the local output file in getfile) and we 
# could display both a save-as popup for picking the local dir,
# and a remote directory listings for picking the file to get;
###############################################################

from Tkinter import Tk, mainloop
from tkMessageBox import showinfo
import getfile, os, sys, thread                 # ftp getfile here, not socket
from PP2E.Internet.Sockets.form import Form     # reuse form tool in socket dir

class FtpForm(Form):
    def __init__(self):
        root = Tk()
        root.title(self.title)
        labels = ['Server Name', 'Remote Dir', 'File Name', 
                  'Local Dir',   'User Name?', 'Password?']
        Form.__init__(self, labels, root)
        self.mutex = thread.allocate_lock()
        self.threads = 0
    def transfer(self, filename, servername, remotedir, userinfo):
        try:
            self.do_transfer(filename, servername, remotedir, userinfo)
            print '%s of "%s" successful'  % (self.mode, filename)
        except:
            print '%s of "%s" has failed:' % (self.mode, filename),
            print sys.exc_info()[0], sys.exc_info()[1]
        self.mutex.acquire()
        self.threads = self.threads - 1
        self.mutex.release()
    def onSubmit(self):
        Form.onSubmit(self)
        localdir   = self.content['Local Dir'].get()
        remotedir  = self.content['Remote Dir'].get()
        servername = self.content['Server Name'].get()
        filename   = self.content['File Name'].get()
        username   = self.content['User Name?'].get()
        password   = self.content['Password?'].get()
        userinfo   = ()
        if username and password:
            userinfo = (username, password)
        if localdir:
            os.chdir(localdir)
        self.mutex.acquire()
        self.threads = self.threads + 1
        self.mutex.release()
        ftpargs = (filename, servername, remotedir, userinfo)
        thread.start_new_thread(self.transfer, ftpargs)
        showinfo(self.title, '%s of "%s" started' % (self.mode, filename))
    def onCancel(self):
        if self.threads == 0:
            Tk().quit()
        else:
            showinfo(self.title, 
                     'Cannot exit: %d threads running' % self.threads)

class FtpGetfileForm(FtpForm):
    title = 'FtpGetfileGui'
    mode  = 'Download'
    def do_transfer(self, filename, servername, remotedir, userinfo):
        getfile.getfile(filename, servername, remotedir, userinfo, 0, 1)

if __name__ == '__main__':
    FtpGetfileForm()
    mainloop()

If you flip back to the end of the previous chapter, you'll find that this version is similar in structure to its counterpart there; in fact, it has the same name (and is distinct only because it lives in a different directory). The class here, though, knows how to use the FTP-based getfile module from earlier in this chapter, instead of the socket-based getfile module we met a chapter ago. When run, this version also implements more input fields, as we see in Figure 11-1.

Figure 11-1. FTP getfile input form

figs/ppy2_1101.gif

Notice that a full file path is entered for the local directory here. Otherwise, the script assumes the current working directory, which changes after each download and can vary depending on where the GUI is launched (e.g., the current directory differs when this script is run by the PyDemos program at the top of the examples tree). When we click this GUI's Submit button (or press the Enter key), this script simply passes the form's input field values as arguments to the getfile.getfile FTP utility function shown earlier in this section. It also posts a pop-up to tell us the download has begun (Figure 11-2).

Figure 11-2. FTP getfile info pop-up

figs/ppy2_1102.gif

As currently coded, further download status messages from this point on show up in the console window; here are the messages for a successful download, as well as one that failed when I mistyped my password (no, it's not really "xxxxxx"):

User Name?      =>      lutz
Server Name     =>      starship.python.net
Local Dir       =>      c:\temp
Password?       =>      xxxxxx
File Name       =>      index.html
Remote Dir      =>      public_html/home
Download of "index.html" successful

User Name?      =>      lutz
Server Name     =>      starship.python.net
Local Dir       =>      c:\temp
Password?       =>      xxxxxx
File Name       =>      index.html
Remote Dir      =>      public_html/home
Download of "index.html" has failed: ftplib.error_perm 530 Login incorrect.

Given a username and password, the downloader logs into the specified account. To do anonymous FTP instead, leave the username and password fields blank. Let's start an anonymous FTP connection to fetch the Python source distribution; Figure 11-3 shows the filled-out form.

Figure 11-3. FTP getfile input form, anonymous FTP

figs/ppy2_1103.gif

Pressing Submit on this form starts a download running in the background as before; we get the pop-up shown in Figure 11-4 to verify the startup.

Figure 11-4. FTP getfile info pop-up

figs/ppy2_1104.gif

Now, to illustrate the threading capabilities of this GUI, let's start another download while this one is in progress. The GUI stays active while downloads are under way, so we simply change the input fields and press Submit again, as done in Figure 11-5.

Figure 11-5. FTP getfile input form, second thread

figs/ppy2_1105.gif

This second download starts in parallel with the one attached to ftp.python.org, because each download is run in a thread, and more than one Internet connection can be active at once. In fact, the GUI itself stays active during downloads only because downloads are run in threads; if they were not, even screen redraws wouldn't happen until a download finished.

We discussed threads in Chapter 3, but this script illustrates some practical thread concerns:

This program takes care to not do anything GUI-related in a download thread. At least in the current release on Windows, only the thread that makes GUIs can process them (a Windows-only rule that has nothing to do with Python or Tkinter).
To avoid killing spawned download threads on some platforms, the GUI must also be careful to not exit while any downloads are in progress. It keeps track of the number of in-progress threads, and just displays the pop-up in Figure 11-6 if we try to kill the GUI while both of these downloads are in progress by pressing the Cancel button.

Figure 11-6. FTP getfile busy pop-up

figs/ppy2_1106.gif

We'll see ways to work around the no-GUI rule for threads when we explore the PyMailGui example near the end of this chapter. To be portable, though, we can't really close the GUI until the active-thread count falls to zero. Here is the sort of output that appears in the console window for these two downloads:

C:\...\PP2E\Internet\Ftp>python getfilegui.py
User Name?      =>
Server Name     =>      ftp.python.org
Local Dir       =>      c:\temp
Password?       =>
File Name       =>      python1.5.tar.gz
Remote Dir      =>      pub/python/src

User Name?      =>      lutz
Server Name     =>      starship.python.net
Local Dir       =>      c:\temp
Password?       =>      xxxxxx
File Name       =>      about-pp.html
Remote Dir      =>      public_html/home
Download of "about-pp.html" successful
Download of "python1.5.tar.gz" successful

This all isn't much more useful than a command-line-based tool, of course, but it can be easily modified by changing its Python code, and it provides enough of a GUI to qualify as a simple, first-cut FTP user interface. Moreover, because this GUI runs downloads in Python threads, more than one can be run at the same time from this GUI without having to start or restart a different FTP client tool.

While we're in a GUI mood, let's add a simple interface to the putfile utility, too. The script in Example 11-10 creates a dialog that starts uploads in threads. It's almost the same as the getfile GUI we just wrote, so there's nothing new to say. In fact, because get and put operations are so similar from an interface perspective, most of the get form's logic was deliberately factored out into a single generic class (FtpForm) such that changes need only be made in a single place. That is, the put GUI here is mostly just a reuse of the get GUI, with distinct output labels and transfer method. It's in a file by itself to make it easy to launch as a standalone program.

Example 11-10. PP2E\Internet\Ftp\putfilegui.py

###############################################################
# launch ftp putfile function with a reusable form gui class;
# see getfilegui for notes: most of the same caveats apply;
# the get and put forms have been factored into a single 
# class such that changes need only be made in one place;
###############################################################

from Tkinter import mainloop
import putfile, getfilegui

class FtpPutfileForm(getfilegui.FtpForm):
    title = 'FtpPutfileGui'
    mode  = 'Upload'
    def do_transfer(self, filename, servername, remotedir, userinfo):
        putfile.putfile(filename, servername, remotedir, userinfo, 0)

if __name__ == '__main__':
    FtpPutfileForm()
    mainloop()

Running this script looks much like running the download GUI, because it's almost entirely the same code at work. Let's upload a couple of files from the client machine to the starship server; Figure 11-7 shows the state of the GUI while starting one.

Figure 11-7. FTP putfile input form

figs/ppy2_1107.gif

And here is the console window output we get when uploading two files in parallel; here again, uploads run in threads, so if we start a new upload before one in progress is finished, they overlap in time:

User Name?      =>      lutz
Server Name     =>      starship.python.net
Local Dir       =>      c:\stuff\website\public_html
Password?       =>      xxxxxx
File Name       =>      about-pp2e.html
Remote Dir      =>      public_html

User Name?      =>      lutz
Server Name     =>      starship.python.net
Local Dir       =>      c:\stuff\website\public_html
Password?       =>      xxxxxx
File Name       =>      about-ppr2e.html
Remote Dir      =>      public_html
Upload of "about-pp2e.html" successful
Upload of "about-ppr2e.html" successful

Finally, we can bundle up both GUIs in a single launcher script that knows how to start the get and put interfaces, regardless of which directory we are in when the script is started, and independent of the platform on which it runs. Example 11-11 shows this process.

Example 11-11. PP2E\Internet\Ftp\PyFtpGui.pyw

################################################################
# spawn ftp get and put guis no matter what dir I'm run from;
# os.getcwd is not necessarily the place this script lives;
# could also hard-code a path from $PP2EHOME, or guessLocation;
# could also do this but need the DOS popup for status messages:
# from PP2E.launchmodes import PortableLauncher
# PortableLauncher('getfilegui', '%s/getfilegui.py' % mydir)()
################################################################

import os, sys
from PP2E.Launcher import findFirst
mydir = os.path.split(findFirst(os.curdir, 'PyFtpGui.pyw'))[0]

if sys.platform[:3] == 'win':
    os.system('start %s/getfilegui.py' % mydir)
    os.system('start %s/putfilegui.py' % mydir)
else:
    os.system('python %s/getfilegui.py &' % mydir)
    os.system('python %s/putfilegui.py &' % mydir)

When this script is started, both the get and put GUIs appear as distinct, independently running programs; alternatively, we might attach both forms to a single interface. We could get much fancier than these two interfaces, of course. For instance, we could pop up local file selection dialogs, and we could display widgets that give status of downloads and uploads in progress. We could even list files available at the remote site in a selectable list box by requesting remote directory listings over the FTP connection. To learn how to add features like that, though, we need to move on to the next section.

11.2.3 Downloading Web Sites (Mirrors)

Once upon a time, Telnet was all I needed. My web site lived at an Internet Service Provider (ISP) that provided general and free Telnet access for all its customers. It was a simple time. All of my site's files lived only in one place -- at my account directory on my ISP's server machine. To make changes to web pages, I simply started a Telnet session connected to my ISP's machine and edited my web pages there online. Moreover, because Telnet sessions can be run from almost any machine with an Internet link, I was able to tweak my web pages everywhere -- from my PC, from machines I had access to on the training road, from archaic machines I played with when I was bored at my day job, and so on. Life was good.

But times have changed. Due to a security breach, my ISP made a blanket decision to revoke Telnet access from all of their customers (except, of course, those who elected to pay a substantial premium to retain it). Seemingly, we weren't even supposed to have known about Telnet in the first place. As a replacement, the ISP mandated that all Telnet-inclined users should begin maintaining web page files locally on their own machines, and upload them by FTP after every change.

That's nowhere near as nice as editing files kept in a single place from almost any computer on the planet, of course, and this triggered plenty of complaints and cancellations among the technically savvy. Unfortunately, the technically savvy is a financially insignificant subset; more to the point, my web page's address had by this time been published in multiple books sold around the world, so changing ISPs would have been no less painful than changing update procedures.

After the shouting, it dawned on me that Python could help here: by writing Python scripts to automate the upload and download tasks associated with maintaining my web site on my PC, I could at least get back some of the mobility and ease of use that I'd lost. Because Python FTP scripts will work on any machine with sockets, I could run them both on my PC and on nearly any other computer where Python was installed. Furthermore, the same scripts used to transfer page files to and from my PC could be used to copy ("mirror") my site to another web server as a backup copy, should my ISP experience an outage (trust me -- it happens).

The following two scripts were born of all of the above frustrations. The first, mirrorflat.py, automatically downloads (i.e., copies) by FTP all the files in a directory at a remote site, to a directory on the local machine. I keep the main copy of my web site files on my PC these days, but really use this script in two ways:

To download my web site to client machines where I want to make edits, I fetch the contents of my public_html web directory of my account on my ISP's machine.
To mirror my site to my account on the starship.python.net server, I run this script periodically from a Telnet session on the starship machine (as I wrote this, starship still clung to the radical notion that users are intelligent enough to run Telnet).

More generally, this script (shown in Example 11-12) will download a directory full of files to any machine with Python and sockets, from any machine running an FTP server.

Example 11-12. PP2E\Internet\Ftp\mirrorflat.py

#!/bin/env python 
###########################################################
# use ftp to copy (download) all files from a remote site
# and directory to a directory on the local machine; e.g., 
# run me periodically to mirror a flat ftp site;
###########################################################

import os, sys, ftplib
from getpass import getpass

remotesite = 'home.rmi.net'
remotedir  = 'public_html'
remoteuser = 'lutz'
remotepass = getpass('Please enter password for %s: ' % remotesite)
localdir   = (len(sys.argv) > 1 and sys.argv[1]) or '.'
if sys.platform[:3] == 'win': raw_input() # clear stream
cleanall   = raw_input('Clean local directory first? ')[:1] in ['y', 'Y']

print 'connecting...'
connection = ftplib.FTP(remotesite)                 # connect to ftp site
connection.login(remoteuser, remotepass)            # login as user/password
connection.cwd(remotedir)                           # cd to directory to copy

if cleanall:
    for localname in os.listdir(localdir):          # try to delete all locals
        try:                                        # first to remove old files
            print 'deleting local', localname
            os.remove(os.path.join(localdir, localname))
        except:
            print 'cannot delete local', localname

count = 0                                           # download all remote files
remotefiles = connection.nlst()                     # nlst() gives files list
                                                    # dir()  gives full details
for remotename in remotefiles:
    localname = os.path.join(localdir, remotename) 
    print 'copying', remotename, 'to', localname
    if remotename[-4:] == 'html' or remotename[-3:] == 'txt':
        # use ascii mode xfer
        localfile = open(localname, 'w')
        callback  = lambda line, file=localfile: file.write(line + '\n')
        connection.retrlines('RETR ' + remotename, callback)
    else:
        # use binary mode xfer
        localfile = open(localname, 'wb')
        connection.retrbinary('RETR ' + remotename, localfile.write)
    localfile.close()
    count = count+1

connection.quit()
print 'Done:', count, 'files downloaded.'

There is not a whole lot new to speak of in this script, compared to other FTP examples we've seen thus far. We open a connection with the remote FTP server, log in with a username and password for the desired account (this script never uses anonymous FTP), and go to the desired remote directory. New here, though, are loops to iterate over all the files in local and remote directories, text-based retrievals, and file deletions:

Deleting all local files

This script has a cleanall option, enabled by interactive prompt. If selected, the script first deletes all the files in the local directory before downloading, to make sure there are no extra files there that aren't also on the server (there may be junk here from a prior download). To delete local files, the script calls os.listdir to get a list of filenames in the directory, and os.remove to delete each; see Chapter 2 earlier in this book (or the Python library manual) for more details if you've forgotten what these calls do.

Notice the use of os.path.join to concatenate a directory path and filename according to the host platform's conventions; os.listdir returns filenames without their directory paths, and this script is not necessarily run in the local directory where downloads will be placed. The local directory defaults to the current directory ("."), but can be set differently with a command-line argument to the script.

Fetching all remote files

To grab all the files in a remote directory, we first need a list of their names. The FTP object's nlst method is the remote equivalent of os.listdir: nlist returns a list of the string names of all files in the current remote directory. Once we have this list, we simply step through it in a loop, running FTP retrieval commands for each filename in turn (more on this in a minute).

The nlst method is, more or less, like requesting a directory listing with an ls command in typical interactive FTP programs, but Python automatically splits up the listing's text into a list of filenames. We can pass it a remote directory to be listed; by default it lists the current server directory. A related FTP method, dir, returns the list of line strings produced by an FTP LIST command; its result is like typing a dir command in an FTP session, and its lines contain complete file information, unlike nlst. If you need to know more about all the remote files, parse the result of a dir method call.

Text-based retrievals

To keep line-feeds in sync with the machines that my web files live on, this script distinguishes between binary and text files. It uses a simple heuristic to do so: filenames ending in .html or .txt are assumed to be ASCII text data (HTML web pages and simple text files), and all others are assumed to be binary files (e.g., GIF and JPEG images, audio files, tar archives). This simple rule won't work for every web site, but it does the trick at mine.

Binary files are pulled down with the retrbinary method we met earlier and a local open mode of "wb" to suppress line-feed byte mapping (this script may be run on Windows or Unix-like platforms). We don't use a chunk size third argument here, though -- it defaults to a reasonable 8K if omitted.

For ASCII text files, the script instead uses the retrlines method, passing in a function to be called for each line in the text file downloaded. The text line handler function mostly just writes the line to a local file. But notice that the handler function created by the lambda here also adds an \n newline character to the end of the line it is passed. Python's retrlines method strips all line-feed characters from lines to side-step platform differences. By adding an \n, the script is sure to add the proper line-feed marker character sequence for the local platform on which this script runs (\n or \r\n). For this automapping of the \n in the script to work, of course, we must also open text output files in "w" text mode, not "wb" -- the mapping from \n to \r\n on Windows happens when data is written to the file.

All of this is simpler in action than in words. Here is the command I use to download my entire web site from my ISP server account to my Windows 98 laptop PC, in a single step:

C:\Stuff\Website\public_html>python %X%\internet\ftp\mirrorflat.py 
Please enter password for home.rmi.net:
Clean local directory first?
connecting...
copying UPDATES to .\UPDATES
copying PythonPowered.gif to .\PythonPowered.gif
copying Pywin.gif to .\Pywin.gif
copying PythonPoweredAnim.gif to .\PythonPoweredAnim.gif
copying PythonPoweredSmall.gif to .\PythonPoweredSmall.gif
copying about-hopl.html to .\about-hopl.html
copying about-lp.html to .\about-lp.html
...
 ...lines deleted...
...
copying training.html to .\training.html
copying trainingCD.GIF to .\trainingCD.GIF
copying uk-1.jpg to .\uk-1.jpg
copying uk-2.jpg to .\uk-2.jpg
copying uk-3.jpg to .\uk-3.jpg
copying whatsnew.html to .\whatsnew.html
copying whatsold.html to .\whatsold.html
copying xlate-lp.html to .\xlate-lp.html
copying uploadflat.py to .\uploadflat.py
copying ora-lp-france.gif to .\ora-lp-france.gif
Done: 130 files downloaded.

This can take awhile to complete (it's bound by network speed constraints), but it is much more accurate and easy than downloading files by hand. The script simply iterates over all the remote files returned by the nlst method, and downloads each with the FTP protocol (i.e., over sockets) in turn. It uses text transfer mode for names that imply obviously text data, and binary mode for others.

With the script running this way, I make sure the initial assignments in it reflect the machines involved, and then run the script from the local directory where I want the site copy to be stored. Because the download directory is usually not where the script lives, I need to give Python the full path to the script file (%X% evaluates a shell variable containing the top-level path to book examples on my machine). When run on the starship server in a Telnet session window, the execution and script directory paths are different, but the script works the same way.

If you elect to delete local files in the download directory, you may also see a batch of "deleting local..." messages scroll by on the screen before any "copying..." lines appear:

...
deleting local uploadflat.py
deleting local whatsnew.html
deleting local whatsold.html
deleting local xlate-lp.html
deleting local old-book.html
deleting local about-pp2e.html
deleting local about-ppr2e.html
deleting local old-book2.html
deleting local mirrorflat.py
...
copying about-pp-japan.html to ./about-pp-japan.html
copying about-pp.html to ./about-pp.html
copying about-ppr-germany.html to ./about-ppr-germany.html
copying about-ppr-japan.html to ./about-ppr-japan.html
copying about-ppr-toc.html to ./about-ppr-toc.html
...

By the way, if you botch the input of the remote site password, a Python exception is raised; I sometimes need to run again (and type slower):

C:\Stuff\Website\public_html>python %X%\internet\ftp\mirrorflat.py
Please enter password for home.rmi.net:
Clean local directory first?
connecting...
Traceback (innermost last):
  File "C:\PP2ndEd\examples\PP2E\internet\ftp\mirrorflat.py", line 20, in ?
    connection.login(remoteuser, remotepass)            # login as user/pass..
  File "C:\Program Files\Python\Lib\ftplib.py", line 316, in login
    if resp[0] == '3': resp = self.sendcmd('PASS ' + passwd)
  File "C:\Program Files\Python\Lib\ftplib.py", line 228, in sendcmd
    return self.getresp()
  File "C:\Program Files\Python\Lib\ftplib.py", line 201, in getresp
    raise error_perm, resp
ftplib.error_perm: 530 Login incorrect.

It's worth noting that this script is at least partially configured by assignments near the top of the file. In addition, the password and deletion options are given by interactive inputs, and one command-line argument is allowed -- the local directory name to store the downloaded files (it defaults to ".", the directory where the script is run). Command-line arguments could be employed to universally configure all the other download parameters and options, too; but because of Python's simplicity and lack of compile/link steps, changing settings in the text of Python scripts is usually just as easy as typing words on a command line.

Windows input note : If you study the previous code closely, you'll notice that an extra raw_input call is made on Windows only, after the getpass password input call and before the cleanall option setting is input. This is a workaround for what seems like a bug in Python 1.5.2 for Windows.

Oddly, the Windows port sometimes doesn't synchronize command-line input and output streams as expected. Here, this seems to be due to a getpass bug or constraint -- because getpass uses the low-level msvcrt keyboard interface module we met in Chapter 2, it appears to not mix well with the stdin stream buffering used by raw_input, and botches the input stream in the process. The extra raw_input clears the input stream (sys.stdin.flush doesn't help).

In fact, without the superfluous raw_input for Windows, this script prompts for cleanall option input, but never stops to let you type a reply! This effectively disables cleanall altogether. To force distinct input and output lines and correct raw_input behavior, some scripts in this book run extra print statements or raw_input calls to sync up streams before further user interaction. There may be other fixes, and this may be improved in future releases; try this script without the extra raw_input to see if this has been repaired in your Python.

11.2.4 Uploading Web Sites

Uploading a full directory is symmetric to downloading: it's mostly a matter of swapping the local and remote machines and operations in the program we just met. The script in Example 11-13 uses FTP to copy all files in a directory on the local machine on which it runs, up to a directory on a remote machine.

I really use this script, too, most often to upload all of the files maintained on my laptop PC to my ISP account in one fell swoop. I also sometimes use it to copy my site from my PC to its starship mirror machine, or from the mirror machine back to my ISP. Because this script runs on any computer with Python and sockets, it happily transfers a directory from any machine on the Net to any machine running an FTP server. Simply change the initial setting in this module as appropriate for the transfer you have in mind.

Example 11-13. PP2E\Internet\Ftp\uploadflat.py

#!/bin/env python 
##########################################################################
# use ftp to upload all files from a local dir to a remote site/directory;
# e.g., run me to copy a web/ftp site's files from your PC to your ISP;
# assumes a flat directory upload: uploadall.py does nested directories.
# to go to my ISP, I change setting to 'home.rmi.net', and 'public_html'.
##########################################################################

import os, sys, ftplib, getpass

remotesite = 'starship.python.net'                  # upload to starship site
remotedir  = 'public_html/home'                     # from win laptop or other
remoteuser = 'lutz'
remotepass = getpass.getpass('Please enter password for %s: ' % remotesite)
localdir   = (len(sys.argv) > 1 and sys.argv[1]) or '.'
if sys.platform[:3] == 'win': raw_input()           # clear stream
cleanall   = raw_input('Clean remote directory first? ')[:1] in ['y', 'Y']

print 'connecting...'
connection = ftplib.FTP(remotesite)                 # connect to ftp site
connection.login(remoteuser, remotepass)            # login as user/password
connection.cwd(remotedir)                           # cd to directory to copy

if cleanall:
    for remotename in connection.nlst():            # try to delete all remotes
        try:                                        # first to remove old files
            print 'deleting remote', remotename
            connection.delete(remotename)           
        except:                                     
            print 'cannot delete remote', remotename

count = 0
localfiles = os.listdir(localdir)                   # upload all local files
                                                    # listdir() strips dir path
for localname in localfiles:  
    localpath = os.path.join(localdir, localname) 
    print 'uploading', localpath, 'to', localname
    if localname[-4:] == 'html' or localname[-3:] == 'txt':
        # use ascii mode xfer
        localfile = open(localpath, 'r')
        connection.storlines('STOR ' + localname, localfile)
    else:
        # use binary mode xfer
        localfile = open(localpath, 'rb')
        connection.storbinary('STOR ' + localname, localfile, 1024)
    localfile.close()
    count = count+1

connection.quit()
print 'Done:', count, 'files uploaded.'

Like the mirror download script, the program here illustrates a handful of new FTP interfaces and a set of FTP scripting techniques:

Deleting all remote files

Just like the mirror script, the upload begins by asking if we want to delete all the files in the remote target directory before copying any files there. This cleanall option is useful if we've deleted files in the local copy of the directory in the client -- the deleted files would remain on the server-side copy unless we delete all files there first. To implement the remote cleanup, this script simply gets a listing of all the files in the remote directory with the FTP nlst method, and deletes each in turn with the FTP delete method. Assuming we have delete permission, the directory will be emptied (file permissions depend on the account we logged into when connecting to the server). We've already moved to the target remote directory when deletions occur, so no directory paths must be prepended to filenames here.

Storing all local files

To apply the upload operation to each file in the local directory, we get a list of local filenames with the standard os.listdir call, and take care to prepend the local source directory path to each filename with the os.path.join call. Recall that os.listdir returns filenames without directory paths, and the source directory may not be the same as the script's execution directory if passed on the command line.

Text-based uploads

This script may be run on both Windows and Unix-like clients, so we need to handle text files specially. Like the mirror download, this script picks text or binary transfer modes by inspecting each filename's extension -- HTML and text files are moved in FTP text mode. We've already met the storbinary FTP object method used to upload files in binary mode -- an exact, byte-for-byte copy appears at the remote site.

Text mode transfers work almost identically: the storlines method accepts an FTP command string and a local file (or file-like) object opened in text mode, and simply copies each line in the local file to a same-named file on the remote machine. As usual, if we run this script on Windows, opening the input file in "r" text mode means that DOS-style \r\n end-of-line sequences are mapped to the \n character as lines are read. When the script is run on Unix and Linux, lines end in a single \n already, so no such mapping occurs. The net effect is that data is read portably, with \n characters to represent end-of-line. For binary files, we open in "rb" mode to suppress such automatic mapping everywhere (we don't want bytes that happen to have the same value as \r to magically disappear when read on Windows).^[3]

As for the mirror download script, this program simply iterates over all files to be transferred (files in the local directory listing this time), and transfers each in turn -- in either text or binary mode, depending on the files' names. Here is the command I use to upload my entire web site from my laptop Windows 98 PC to the remote Unix server at my ISP, in a single step:

C:\Stuff\Website\public_html>python %X%\Internet\Ftp\uploadflat.py 
Please enter password for starship.python.net:
Clean remote directory first?
connecting...
uploading .\LJsuppcover.jpg to LJsuppcover.jpg
uploading .\PythonPowered.gif to PythonPowered.gif
uploading .\PythonPoweredAnim.gif to PythonPoweredAnim.gif
uploading .\PythonPoweredSmall.gif to PythonPoweredSmall.gif
uploading .\Pywin.gif to Pywin.gif
uploading .\UPDATES to UPDATES
uploading .\about-hopl.html to about-hopl.html
uploading .\about-lp.html to about-lp.html
uploading .\about-pp-japan.html to about-pp-japan.html
...
 ...lines deleted...
...
uploading .\trainingCD.GIF to trainingCD.GIF
uploading .\uk-1.jpg to uk-1.jpg
uploading .\uk-2.jpg to uk-2.jpg
uploading .\uk-3.jpg to uk-3.jpg
uploading .\uploadflat.py to uploadflat.py
uploading .\whatsnew.html to whatsnew.html
uploading .\whatsold.html to whatsold.html
uploading .\xlate-lp.html to xlate-lp.html
Done: 131 files uploaded.

Like the mirror example, I usually run this command from the local directory where my web files are kept, and I pass Python the full path to the script. When I run this on the starship Linux server, it works the same, but the paths to the script and my web files directory differ. If you elect to clean the remote directory before uploading, you'll get a bunch of "deleting remote..." messages before the "uploading..." lines here, too:

...
deleting remote uk-3.jpg
deleting remote whatsnew.html
deleting remote whatsold.html
deleting remote xlate-lp.html
deleting remote uploadflat.py
deleting remote ora-lp-france.gif
deleting remote LJsuppcover.jpg
deleting remote sonyz505js.gif
deleting remote pic14.html
...

11.2.5 Uploads with Subdirectories

Perhaps the biggest limitation of the web site download and upload scripts we just met are that they assume the site directory is flat (hence their names) -- i.e., both transfer simple files only, and neither handles nested subdirectories within the web directory to be transferred.

For my purposes, that's a reasonable constraint. I avoid nested subdirectories to keep things simple, and I store my home web site as a simple directory of files. For other sites (including one I keep at the starship machine), site transfer scripts are easier to use if they also automatically transfer subdirectories along the way.

It turns out that supporting directories is fairly simple -- we need to add only a bit of recursion and remote directory creation calls. The upload script in Example 11-14 extends the one we just saw, to handle uploading all subdirectories nested within the transferred directory. Furthermore, it recursively transfers subdirectories within subdirectories -- the entire directory tree contained within the top-level transfer directory is uploaded to the target directory at the remote server.

Example 11-14. PP2E\Internet\Ftp\uploadall.py

#!/bin/env python
##########################################################################
# use ftp to upload all files from a local dir to a remote site/directory;
# this version supports uploading nested subdirectories too, but not the 
# cleanall option (that requires parsing ftp listings to detect remote 
# dirs, etc.);  to upload subdirectories, uses os.path.isdir(path) to see 
# if a local file is really a directory, FTP().mkd(path) to make the dir
# on the remote machine (wrapped in a try in case it already exists there), 
# and recursion to upload all files/dirs inside the nested subdirectory. 
# see also: uploadall-2.py, which doesn't assume the topremotedir exists.
##########################################################################
  
import os, sys, ftplib
from getpass import getpass

remotesite   = 'home.rmi.net'          # upload from pc or starship to rmi.net
topremotedir = 'public_html' 
remoteuser   = 'lutz'
remotepass   = getpass('Please enter password for %s: ' % remotesite)
toplocaldir  = (len(sys.argv) > 1 and sys.argv[1]) or '.'

print 'connecting...'
connection = ftplib.FTP(remotesite)               # connect to ftp site
connection.login(remoteuser, remotepass)          # login as user/password
connection.cwd(topremotedir)                      # cd to directory to copy to
                                                  # assumes topremotedir exists
def uploadDir(localdir):
    global fcount, dcount
    localfiles = os.listdir(localdir)
    for localname in localfiles:  
        localpath = os.path.join(localdir, localname) 
        print 'uploading', localpath, 'to', localname
        if os.path.isdir(localpath):
            # recur into subdirs
            try:
                connection.mkd(localname)
                print localname, 'directory created'
            except: 
                print localname, 'directory not created'
            connection.cwd(localname)
            uploadDir(localpath)
            connection.cwd('..')
            dcount = dcount+1
        else:
            if localname[-4:] == 'html' or localname[-3:] == 'txt':
                # use ascii mode xfer
                localfile = open(localpath, 'r')
                connection.storlines('STOR ' + localname, localfile)
            else:
                # use binary mode xfer
                localfile = open(localpath, 'rb')
                connection.storbinary('STOR ' + localname, localfile, 1024)
            localfile.close()
            fcount = fcount+1

fcount = dcount = 0
uploadDir(toplocaldir)
connection.quit()
print 'Done:', fcount, 'files and', dcount, 'directories uploaded.'

Like the flat upload script, this one can be run on any machine with Python and sockets and upload to any machine running an FTP server; I run it both on my laptop PC and on starship by Telnet to upload sites to my ISP.

In the interest of space, I'll leave studying this variant in more depth as a suggested exercise. Two quick pointers, though:

The crux of the matter here is the os.path.isdir test near the top; if this test detects a directory in the current local directory, we create a same-named directory on the remote machine with connection.mkd and descend into it with connection.cwd, and recur into the subdirectory on the local machine. Like all FTP object methods, mkd and cwd methods issue FTP commands to the remote server. When we exit a local subdirectory, we run a remote cwd('..') to climb to the remote parent directory and continue. The rest of the script is roughly the same as the original.
Note that this script handles only directory tree uploads; recursive uploads are generally more useful than recursive downloads, if you maintain your web sites on your local PC and upload to a server periodically, as I do. If you also want to download (mirror) a web site that has subdirectories, see the mirror scripts in the Python source distribution's Tools directory (currently, at file location Tools/scripts/ftpmirror.py). It's not much extra work, but requires parsing the output of a remote listing command to detect remote directories, and that is just complicated enough for me to omit here. For the same reason, the recursive upload script shown here doesn't support the remote directory cleanup option of the original -- such a feature would require parsing remote listings as well.

For more context, also see the uploadall-2.py version of this script in the examples distribution; it's similar, but coded so as not to assume that the top-level remote directory already exists.

11.3 Processing Internet Email

Some of the other most common higher-level Internet protocols have to do with reading and sending email messages: POP and IMAP for fetching email from servers,^[4] SMTP for sending new messages, and other formalisms such as rfc822 for specifying email message contents and format. You don't normally need to know about such acronyms when using common email tools; but internally, programs like Microsoft Outlook talk to POP and SMTP servers to do your bidding.

Like FTP, email ultimately consists of formatted commands and byte streams shipped over sockets and ports (port 110 for POP; 25 for SMTP). But also like FTP, Python has standard modules to simplify all aspects of email processing. In this section, we explore the POP and SMTP interfaces for fetching and sending email at servers, and the rfc822 interfaces for parsing information out of email header lines; other email interfaces in Python are analogous and are documented in the Python library reference manual.

11.3.1 POP: Reading Email

I used to be an old-fashioned guy. I admit it: up until recently, I preferred to check my email by telneting to my ISP and using a simple command-line email interface. Of course, that's not ideal for mail with attachments, pictures, and the like, but its portability is staggering -- because Telnet runs on almost any machine with a network link, I was able to check my mail quickly and easily from anywhere on the planet. Given that I make my living traveling around the world teaching Python classes, this wild accessibility was a big win.

If you've already read the web site mirror scripts sections earlier in this chapter, you've already heard my tale of ISP woe, so I won't repeat it here. Suffice it to say that times have changed on this front too: when my ISP took away Telnet access, they also took away my email access.^[5] Luckily, Python came to the rescue here, too -- by writing email access scripts in Python, I can still read and send email from any machine in the world that has Python and an Internet connection. Python can be as portable a solution as Telnet.

Moreover, I can still use these scripts as an alternative to tools suggested by the ISP, such as Microsoft Outlook. Besides not being a big fan of delegating control to commercial products of large companies, tools like Outlook generally download mail to your PC and delete it from the mail server as soon as you access it. This keeps your email box small (and your ISP happy), but isn't exactly friendly to traveling Python salespeople -- once accessed, you cannot re-access a prior email from any machine except the one where it was initially downloaded to. If you need to see an old email and don't have your PC handy, you're out of luck.

The next two scripts represent one solution to these portability and single-machine constraints (we'll see others in this and later chapters). The first, popmail.py, is a simple mail reader tool, which downloads and prints the contents of each email in an email account. This script is admittedly primitive, but it lets you read your email on any machine with Python and sockets; moreover, it leaves your email intact on the server. The second, smtpmail.py, is a one-shot script for writing and sending a new email message.

11.3.1.1 Mail configuration module

Before we get to either of the two scripts, though, let's first take a look a common module they both import and use. The module in Example 11-15 is used to configure email parameters appropriately for a particular user. It's simply a collection of assignments used by all the mail programs that appear in this book; isolating these configuration settings in this single module makes it easy to configure the book's email programs for a particular user.

If you want to use any of this book's email programs to do mail processing of your own, be sure to change its assignments to reflect your servers, account usernames, and so on (as shown, they refer to my email accounts). Not all of this module's settings are used by the next two scripts; we'll come back to this module at later examples to explain some of the settings here.

Example 11-15. PP2E\Internet\Email\mailconfig.py

################################################################
# email scripts get their server names and other email config
# options from this module: change me to reflect your machine
# names, sig, etc.; could get some from the command line too;
################################################################

#-------------------------------------------
# SMTP email server machine name (send)
#-------------------------------------------

smtpservername = 'smtp.rmi.net'          # or starship.python.net, 'localhost'

#-------------------------------------------
# POP3 email server machine, user (retrieve)
#-------------------------------------------

popservername  = 'pop.rmi.net'           # or starship.python.net, 'localhost'
popusername    = 'lutz'                  # password fetched of asked wehen run

#-------------------------------------------
# local file where pymail saves pop mail
# PyMailGui insead asks with a popup dialog
#-------------------------------------------

savemailfile   = r'c:\stuff\etc\savemail.txt'       # use dialog in PyMailGui

#---------------------------------------------------------------
# PyMailGui: optional name of local one-line text file with your 
# pop password; if empty or file cannot be read, pswd requested 
# when run; pswd is not encrypted so leave this empty on shared 
# machines; PyMailCgi and pymail always ask for pswd when run.
#---------------------------------------------------------------

poppasswdfile  = r'c:\stuff\etc\pymailgui.txt'      # set to '' to be asked

#---------------------------------------------------------------
# personal information used by PyMailGui to fill in forms;
# sig  -- can be a triple-quoted block, ignored if empty string;
# addr -- used for initial value of "From" field if not empty,
# else tries to guess From for replies, with varying success;
#---------------------------------------------------------------

myaddress   = '[email protected]'
mysignature = '--Mark Lutz  (http://rmi.net/~lutz)  [PyMailGui 1.0]'

11.3.1.2 POP mail reader module

On to reading email in Python: the script in Example 11-16 employs Python's standard poplib module, an implementation of the client-side interface to POP -- the Post Office Protocol. POP is just a well-defined way to fetch email from servers over sockets. This script connects to a POP server to implement a simple yet portable email download and display tool.

Example 11-16. PP2E\Internet\Email\popmail.py

#!/usr/local/bin/python
######################################################
# use the Python POP3 mail interface module to view
# your pop email account messages; this is just a 
# simple listing--see pymail.py for a client with
# more user interaction features, and smtpmail.py 
# for a script which sends mail; pop is used to 
# retrieve mail, and runs on a socket using port 
# number 110 on the server machine, but Python's 
# poplib hides all protocol details; to send mail, 
# use the smtplib module (or os.popen('mail...').
# see also: unix mailfile reader in App framework.
######################################################

import poplib, getpass, sys, mailconfig

mailserver = mailconfig.popservername      # ex: 'pop.rmi.net'
mailuser   = mailconfig.popusername        # ex: 'lutz'
mailpasswd = getpass.getpass('Password for %s?' % mailserver)

print 'Connecting...'
server = poplib.POP3(mailserver)
server.user(mailuser)                      # connect, login to mail server
server.pass_(mailpasswd)                   # pass is a reserved word

try:
    print server.getwelcome()              # print returned greeting message 
    msgCount, msgBytes = server.stat()
    print 'There are', msgCount, 'mail messages in', msgBytes, 'bytes'
    print server.list()
    print '-'*80
    if sys.platform[:3] == 'win': raw_input()      # windows getpass is odd
    raw_input('[Press Enter key]')

    for i in range(msgCount):
        hdr, message, octets = server.retr(i+1)    # octets is byte count
        for line in message: print line            # retrieve, print all mail
        print '-'*80                               # mail box locked till quit
        if i < msgCount - 1: 
           raw_input('[Press Enter key]')
finally:                                           # make sure we unlock mbox
    server.quit()                                  # else locked till timeout
print 'Bye.'

Though primitive, this script illustrates the basics of reading email in Python. To establish a connection to an email server, we start by making an instance of the poplib.POP3 object, passing in the email server machine's name:

server = poplib.POP3(mailserver)

If this call doesn't raise an exception, we're connected (by socket) to the POP server listening for requests on POP port number 110 at the machine where our email account lives. The next thing we need to do before fetching messages is tell the server our username and password; notice that the password method is called pass_ -- without the trailing underscore, pass would name a reserved word and trigger a syntax error:

server.user(mailuser)                      # connect, login to mail server
server.pass_(mailpasswd)                   # pass is a reserved word

To keep things simple and relatively secure, this script always asks for the account password interactively; the getpass module we met in the FTP section of this chapter is used to input but not display a password string typed by the user.

Once we've told the server our username and password, we're free to fetch mailbox information with the stat method (number messages, total bytes among all messages), and fetch a particular message with the retr method (pass the message number; they start at 1):

msgCount, msgBytes = server.stat()
hdr, message, octets = server.retr(i+1)    # octets is byte count

When we're done, we close the email server connection by calling the POP object's quit method:

server.quit()                              # else locked till timeout

Notice that this call appears inside the finally clause of a try statement that wraps the bulk of the script. To minimize complications associated with changes, POP servers lock your email box between the time you first connect and the time you close your connection (or until an arbitrarily long system-defined time-out expires). Because the POP quit method also unlocks the mailbox, it's crucial that we do this before exiting, whether an exception is raised during email processing or not. By wrapping the action in a try/finally statement, we guarantee that the script calls quit on exit to unlock the mailbox to make it accessible to other processes (e.g., delivery of incoming email).

Here is the popmail script in action, displaying two messages in my account's mailbox on machine pop.rmi.net -- the domain name of the mail server machine at rmi.net, configured in module mailconfig:

C:\...\PP2E\Internet\Email>python popmail.py
Password for pop.rmi.net?
Connecting...
+OK Cubic Circle's v1.31 1998/05/13 POP3 ready <4860000073ed6c39@chevalier>
There are 2 mail messages in 1386 bytes
('+OK 2 messages (1386 octets)', ['1 744', '2 642'], 14)
--------------------------------------------------------------------------------


[Press Enter key]
Received: by chevalier (mbox lutz)
 (with Cubic Circle's cucipop (v1.31 1998/05/13) Wed Jul 12 16:13:33 2000)
X-From_: [email protected]  Wed Jul 12 16:10:28 2000
Return-Path: <[email protected]>
Received: from VAIO (dial-218.101.denco.rmi.net [166.93.218.101])
        by chevalier.rmi.net (8.9.3/8.9.3) with ESMTP id QAA21434
        for <[email protected]>; Wed, 12 Jul 2000 16:10:27 -0600 (MDT)
From: [email protected]
Message-Id: <[email protected]>
To: [email protected]
Date: Wed Jul 12 16:03:59 2000
Subject: I'm a Lumberjack, and I'm okay
X-Mailer: PyMailGui Version 1.0 (Python)

I cut down trees, I skip and jump,
I like to press wild flowers...

--------------------------------------------------------------------------------

[Press Enter key]
Received: by chevalier (mbox lutz)
 (with Cubic Circle's cucipop (v1.31 1998/05/13) Wed Jul 12 16:13:54 2000)
X-From_: [email protected]  Wed Jul 12 16:12:42 2000
Return-Path: <[email protected]>
Received: from VAIO (dial-218.101.denco.rmi.net [166.93.218.101])
        by chevalier.rmi.net (8.9.3/8.9.3) with ESMTP id QAA24093
        for <[email protected]>; Wed, 12 Jul 2000 16:12:37 -0600 (MDT)
Message-Id: <[email protected]>
From: [email protected]
To: [email protected]
Date: Wed Jul 12 16:06:12 2000
Subject: testing
X-Mailer: PyMailGui Version 1.0 (Python)

Testing Python mail tools.

--------------------------------------------------------------------------------

Bye.

This interface is about as simple as it could be -- after connecting to the server, it prints the complete raw text of one message at a time, pausing between each until you type the enter key. The raw_input built-in is called to wait for the key press between message displays.^[6] The pause keeps messages from scrolling off the screen too fast; to make them visually distinct, emails are also separated by lines of dashes. We could make the display more fancy (e.g., we'll pick out parts of messages in later examples with the rfc822 module), but here we simply display the whole message that was sent.

If you look closely at these mails' text, you may notice that they were actually sent by another program called PyMailGui (a program we'll meet near the end of this chapter). The "X-Mailer" header line, if present, typically identifies the sending program. In fact, there are a variety of extra header lines that can be sent in a message's text. The "Received:" headers, for example, trace the machines that a message passed though on its way to the target mailbox. Because popmail prints the entire raw text of a message, you see all headers here, but you may see only a few by default in end-user-oriented mail GUIs such as Outlook.

Before we move on, I should also point out that this script never deletes mail from the server. Mail is simply retrieved and printed and will be shown again the next time you run the script (barring deletion in another tool). To really remove mail permanently, we need to call other methods (e.g., server.dele(msgnum)) but such a capability is best deferred until we develop more interactive mail tools.

11.3.2 SMTP: Sending Email

There is a proverb in hackerdom that states that every useful computer program eventually grows complex enough to send email. Whether such somewhat ancient wisdom rings true or not in practice, the ability to automatically initiate email from within a program is a powerful tool.

For instance, test systems can automatically email failure reports, user interface programs can ship purchase orders to suppliers by email, and so on. Moreover, a portable Python mail script could be used to send messages from any computer in the world with Python and an Internet connection. Freedom from dependence on mail programs like Outlook is an attractive feature if you happen to make your living traveling around teaching Python on all sorts of computers.

Luckily, sending email from within a Python script is just as easy as reading it. In fact, there are at least four ways to do so:

Calling os.popen to launch a command-line mail program

On some systems, you can send email from a script with a call of the form:

os.popen('mail -s "xxx" [email protected]', 'w').write(text)

As we've seen earlier in the book, the popen tool runs the command-line string passed to its first argument, and returns a file-like object connected to it. If we use an open mode of "w", we are connected to the command's standard input stream -- here, we write the text of the new mail message to the standard Unix mail command-line program. The net effect is as if we had run mail interactively, but it happens inside a running Python script.

Running the sendmail program

The open source sendmail program offers another way to initiate mail from a program. Assuming it is installed and configured on your system, you can launch it using Python tools like the os.popen call of the previous paragraph.

Using the standard smtplib Python module

Python's standard library comes with support for the client-side interface to SMTP -- the Simple Mail Transfer Protocol -- a higher-level Internet standard for sending mail over sockets. Like the poplib module we met in the previous section, smtplib hides all the socket and protocol details, and can be used to send mail on any machine with Python and a socket-based Internet link.

Fetching and using third party packages and tools

Other tools in the open source library provide higher-level mail handling packages for Python (accessible from http://www.python.org). Most build upon one of the prior three techniques.

Of these four options, smtplib is by far the most portable and powerful. Using popen to spawn a mail program usually works on Unix-like platforms only, not on Windows (it assumes a command-line mail program). And although the sendmail program is powerful, it is also somewhat Unix-biased, complex, and may not be installed even on all Unix-like machines.

By contrast, the smtplib module works on any machine that has Python and an Internet link, including Unix, Linux, and Windows. Moreover, SMTP affords us much control over the formatting and routing of email. Since it is arguably the best option for sending mail from a Python script, let's explore a simple mailing program that illustrates its interfaces. The Python script shown in Example 11-17 is intended to be used from an interactive command line; it reads a new mail message from the user and sends the new mail by SMTP using Python's smtplib module.

Example 11-17. PP2E\Internet\Email\smtpmail.py

#!/usr/local/bin/python
######################################################
# use the Python SMTP mail interface module to send
# email messages; this is just a simple one-shot 
# send script--see pymail, PyMailGui, and PyMailCgi
# for clients with more user interaction features, 
# and popmail.py for a script which retrieves mail; 
######################################################

import smtplib, string, sys, time, mailconfig
mailserver = mailconfig.smtpservername         # ex: starship.python.net

From = string.strip(raw_input('From? '))       # ex: [email protected]
To   = string.strip(raw_input('To?   '))       # ex: [email protected]
To   = string.split(To, ';')                   # allow a list of recipients
Subj = string.strip(raw_input('Subj? '))

# prepend standard headers
date = time.ctime(time.time())
text = ('From: %s\nTo: %s\nDate: %s\nSubject: %s\n' 
                         % (From, string.join(To, ';'), date, Subj))

print 'Type message text, end with line=(ctrl + D or Z)'
while 1:
    line = sys.stdin.readline()
    if not line: 
        break                        # exit on ctrl-d/z
  # if line[:4] == 'From':
  #     line = '>' + line            # servers escape for us
    text = text + line

if sys.platform[:3] == 'win': print
print 'Connecting...'
server = smtplib.SMTP(mailserver)              # connect, no login step
failed = server.sendmail(From, To, text)
server.quit() 
if failed:                                     # smtplib may raise exceptions
    print 'Failed recipients:', failed         # too, but let them pass here
else:
    print 'No errors.'
print 'Bye.'

Most of this script is user interface -- it inputs the sender's address ("From"), one or more recipient addresses ("To", separated by ";" if more than one), and a subject line. The sending date is picked up from Python's standard time module, standard header lines are formatted, and the while loop reads message lines until the user types the end-of-file character (Ctrl-Z on Windows, Ctrl-D on Linux).

The rest of the script is where all the SMTP magic occurs: to send a mail by SMTP, simply run these two sorts of calls:

server = smtplib.SMTP(mailserver): Make an instance of the SMTP object, passing in the name of the SMTP server that will dispatch the message first. If this doesn't throw an exception, you're connected to the SMTP server via a socket when the call returns.
failed = server.sendmail(From, To, text): Call the SMTP object's sendmail method, passing in the sender address, one or more recipient addresses, and the text of the message itself with as many standard mail header lines as you care to provide.

When you're done, call the object's quit method to disconnect from the server. Notice that, on failure, the sendmail method may either raise an exception or return a list of the recipient addresses that failed; the script handles the latter case but lets exceptions kill the script with a Python error message.

11.3.2.1 Sending messages

Okay -- let's ship a few messages across the world. The smtpmail script is a one-shot tool: each run allows you to send a single new mail message. Like most of the client-side tools in this chapter, it can be run from any computer with Python and an Internet link. Here it is running on Windows 98:

C:\...\PP2E\Internet\Email>python smtpmail.py
From? [email protected]
To?   [email protected]
Subj? A B C D E F G
Type message text, end with line=(ctrl + D or Z)
Fiddle de dum, Fiddle de dee,
Eric the half a bee.

Connecting...
No errors.
Bye.

This mail is sent to my address ([email protected]), so it ultimately shows up in my mailbox at my ISP, but only after being routed through an arbitrary number of machines on the Net, and across arbitrarily distant network links. It's complex at the bottom, but usually, the Internet "just works."

Notice the "From" address, though -- it's completely fictitious (as far as I know, at least). It turns out that we can usually provide any "From" address we like because SMTP doesn't check its validity (only its general format is checked). Furthermore, unlike POP, there is no notion of a username or password in SMTP, so the sender is more difficult to determine. We need only pass email to any machine with a server listening on the SMTP port, and don't need an account on that machine. Here, [email protected] works fine as the sender; [email protected] would work just as well.

I'm going to tell you something now for instructional purposes only: it turns out that this behavior is the basis of all those annoying junk emails that show up in your mailbox without a real sender's address.^[7] Salesmen infected with e-millionaire mania will email advertising to all addresses on a list without providing a real "From" address, to cover their tracks.

Normally, of course, you should use the same "To" address in the message and the SMTP call, and provide your real email address as the "From" value (that's the only way people will be able to reply to your message). Moreover, apart from teasing your significant other, sending phony addresses is just plain bad Internet citizenship. Let's run the script again to ship off another mail with more politically correct coordinates:

C:\...\PP2E\Internet\Email>python smtpmail.py
From? [email protected]
To?   [email protected]
Subj? testing smtpmail
Type message text, end with line=(ctrl + D or Z)
Lovely Spam! Wonderful Spam!
Connecting...
No errors.
Bye.

At this point, we could run whatever email tool we normally use to access our mailbox to verify the results of these two send operations; the two new emails should show up in our mailbox regardless of which mail client is used to view them. Since we've already written a Python script for reading mail, though, let's put it to use as a verification tool -- running the popmail script from the last section reveals our two new messages at the end of the mail list:

C:\...\PP2E\Internet\Email>python popmail.py 
Password for pop.rmi.net?
Connecting...
+OK Cubic Circle's v1.31 1998/05/13 POP3 ready <c4050000b6ee6c39@chevalier>
There are 6 mail messages in 10941 bytes
('+OK 6 messages (10941 octets)', ['1 744', '2 642', '3 4456', '4 697', '5 3791'
, '6 611'], 44)
--------------------------------------------------------------------------------
...
 ...lines omitted...
...
[Press Enter key]
Received: by chevalier (mbox lutz)
 (with Cubic Circle's cucipop (v1.31 1998/05/13) Wed Jul 12 16:19:20 2000)
X-From_: [email protected]  Wed Jul 12 16:16:31 2000
Return-Path: <[email protected]>
Received: from VAIO (dial-218.101.denco.rmi.net [166.93.218.101])
        by chevalier.rmi.net (8.9.3/8.9.3) with ESMTP id QAA28647
        for <[email protected]>; Wed, 12 Jul 2000 16:16:30 -0600 (MDT)
From: [email protected]
Message-Id: <[email protected]>
To: [email protected]
Date: Wed Jul 12 16:09:21 2000
Subject: A B C D E F G

Fiddle de dum, Fiddle de dee,
Eric the half a bee.

--------------------------------------------------------------------------------

[Press Enter key]
Received: by chevalier (mbox lutz)
 (with Cubic Circle's cucipop (v1.31 1998/05/13) Wed Jul 12 16:19:51 2000)
X-From_: [email protected]  Wed Jul 12 16:17:58 2000
Return-Path: <[email protected]>
Received: from VAIO (dial-218.101.denco.rmi.net [166.93.218.101])
        by chevalier.rmi.net (8.9.3/8.9.3) with ESMTP id QAA00415
        for <[email protected]>; Wed, 12 Jul 2000 16:17:57 -0600 (MDT)
Message-Id: <[email protected]>
From: [email protected]
To: [email protected]
Date: Wed Jul 12 16:10:55 2000
Subject: testing smtpmail

Lovely Spam! Wonderful Spam!

--------------------------------------------------------------------------------

Bye.

11.3.2.2 More ways to abuse the Net

The first mail here was the one we sent with a fictitious address; the second was the more legitimate message. Like "From" addresses, header lines are a bit arbitrary under SMTP, too. smtpmail automatically adds "From:" and "To:" header lines in the message's text with the same addresses as passed to the SMTP interface, but only as a polite convention. Sometimes, though, you can't tell who a mail was sent to either -- to obscure the target audience, spammers also may play games with "Bcc" blind copies or the contents of headers in the message's text.

For example, if we change smtpmail to not automatically generate a "To:" header line with the same address(es) sent to the SMTP interface call, we can manually type a "To:" header that differs from the address we're really sending to:

C:\...\PP2E\Internet\Email>python smtpmail-noTo.py
From? [email protected]
To?   [email protected]
Subj? a b c d e f g
Type message text, end with line=(ctrl + D or Z)
To: [email protected]
Fiddle de dum, Fiddle de dee,
Eric the half a bee.
Connecting...
No errors.
Bye.

In some ways, the "From" and "To" addresses in send method calls and message header lines are similar to addresses on envelopes and letters in envelopes. The former is used for routing, but the latter is what the reader sees. Here, I gave the "To" address as my mailbox on the starship.python.net server, but gave a fictitious name in the manually typed "To:" header line; the first address is where it really goes. A command-line mail tool running on starship by Telnet reveals two bogus mails sent -- one with a bad "From:", and the one with an additionally bad "To:" that we just sent:

[lutz@starship lutz]$ mail 
Mail version 8.1 6/6/93.  Type ? for help.
"/home/crew/lutz/Mailbox": 22 messages 12 new 22 unread
 ...more...
>N 21 Eric.the.Half.a.Bee@  Thu Jul 13 20:22  20/789   "A B C D E F G"
 N 22 Eric.the.Half.a.Bee@  Thu Jul 13 20:26  19/766   "a b c d e f g"

& 21
Message 21:
From [email protected] Thu Jul 13 20:21:18 2000
Delivered-To: [email protected]
From: [email protected] 
To: [email protected] 
Date: Thu Jul 13 14:15:55 2000
Subject: A B C D E F G

Fiddle de dum, Fiddle de dee,
Eric the half a bee.

& 22
Message 22:
From [email protected] Thu Jul 13 20:26:34 2000
Delivered-To: [email protected]
From: [email protected] 
Date: Thu Jul 13 14:20:22 2000
Subject: a b c d e f g
To: [email protected] 

Fiddle de dum, Fiddle de dee,
Eric the half a bee.

If your mail tool picks out the "To:" line, such mails look odd when viewed. For instance, here's another sent to my rmi.net mailbox:

C:\...\PP2E\Internet\Email>python smtpmail-noTo.py
From? [email protected]
To?   [email protected]
Subj? Killer bunnies
Type message text, end with line=(ctrl + D or Z)
To: [email protected]
Run away!  Run away! ...
Connecting...
No errors.
Bye.

When it shows up in my mailbox on rmi.net, it's difficult to tell much about its origin or destination in either Outlook or a Python-coded mail tool we'll meet near the end of this chapter (see Figure 11-8 and Figure 11-9). And its raw text will only show the machines it has been routed through.

Figure 11-8. Bogus mail in Outlook

figs/ppy2_1108.gif

Figure 11-9. Bogus mail in a Python mail tool (PyMailGui)

figs/ppy2_1109.gif

Once again, though -- don't do this unless you have good reason. I'm showing it for header-line illustration purposes (e.g., in a later section, we'll add an "X-mailer:" header line to identify the sending program). Furthermore, to stop a criminal, you sometimes need to think like one -- you can't do much about spam mail unless you understand how it is generated. To write an automatic spam filter that deletes incoming junk mail, for instance, you need to know the telltale signs to look for in a message's text. And "To" address juggling may be useful in the context of legitimate mailing lists.

But really, sending email with bogus "From:" and "To:" lines is equivalent to making anonymous phone calls. Most mailers won't even let you change the "From" line, and don't distinguish between the "To" address and header line, but SMTP is wide open in this regard. Be good out there; okay?

11.3.2.3 Back to the big Internet picture

So where are we at in the Internet abstraction model now? Because mail is transferred over sockets (remember sockets?), they are at the root of all of this email fetching and sending. All email read and written ultimately consists of formatted bytes shipped over sockets between computers on the Net. As we've seen, though, the POP and SMTP interfaces in Python hide all the details. Moreover, the scripts we've begun writing even hide the Python interfaces and provide higher-level interactive tools.

Both popmail and smtpmail provide portable email tools, but aren't quite what we'd expect in terms of usability these days. In the next section, we'll use what we've seen thus far to implement a more interactive mail tool. At the end of this email section, we'll also code a Tk email GUI, and then we'll go on to build a web-based interface in a later chapter. All of these tools, though, vary primarily in terms of user interface only; each ultimately employs the mail modules we've met here to transfer mail message text over the Internet with sockets.

11.3.3 A Command-Line Email Client

Now, let's put together what we've learned about fetching and sending email in a simple but functional command-line email tool. The script in Example 11-18 implements an interactive email session -- users may type commands to read, send, and delete email messages.

Example 11-18. PP2E\Internet\Emal\pymail.py

#!/usr/local/bin/python
######################################################
# A simple command-line email interface client in 
# Python; uses Python POP3 mail interface module to
# view pop email account messages; uses rfc822 and
# StringIO modules to extract mail message headers; 
######################################################

import poplib, rfc822, string, StringIO

def connect(servername, user, passwd):
    print 'Connecting...'
    server = poplib.POP3(servername)
    server.user(user)                    # connect, login to mail server
    server.pass_(passwd)                 # pass is a reserved word
    print server.getwelcome()            # print returned greeting message 
    return server

def loadmessages(servername, user, passwd, loadfrom=1):
    server = connect(servername, user, passwd)
    try:
        print server.list()
        (msgCount, msgBytes) = server.stat()
        print 'There are', msgCount, 'mail messages in', msgBytes, 'bytes'
        print 'Retrieving:',
        msgList = []
        for i in range(loadfrom, msgCount+1):            # empty if low >= high
            print i,                                     # fetch mail now
            (hdr, message, octets) = server.retr(i)      # save text on list
            msgList.append(string.join(message, '\n'))   # leave mail on server 
        print
    finally:
        server.quit()                                    # unlock the mail box
    assert len(msgList) == (msgCount - loadfrom) + 1     # msg nums start at 1
    return msgList

def deletemessages(servername, user, passwd, toDelete, verify=1):
    print 'To be deleted:', toDelete
    if verify and raw_input('Delete?')[:1] not in ['y', 'Y']:
        print 'Delete cancelled.'
    else:
        server = connect(servername, user, passwd)
        try:
            print 'Deleting messages from server.'
            for msgnum in toDelete:                 # reconnect to delete mail
                server.dele(msgnum)                 # mbox locked until quit()
        finally:
            server.quit()

def showindex(msgList):
    count = 0   
    for msg in msgList:                      # strip,show some mail headers
        strfile = StringIO.StringIO(msg)     # make string look like a file
        msghdrs = rfc822.Message(strfile)    # parse mail headers into a dict
        count   = count + 1
        print '%d:\t%d bytes' % (count, len(msg))
        for hdr in ('From', 'Date', 'Subject'):
            try:
                print '\t%s=>%s' % (hdr, msghdrs[hdr])
            except KeyError:
                print '\t%s=>(unknown)' % hdr
            #print '\n\t%s=>%s' % (hdr, msghdrs.get(hdr, '(unknown)')
        if count % 5 == 0:
            raw_input('[Press Enter key]')  # pause after each 5 

def showmessage(i, msgList):
    if 1 <= i <= len(msgList):
        print '-'*80
        print msgList[i-1]              # this prints entire mail--hdrs+text
        print '-'*80                    # to get text only, call file.read()
    else:                               # after rfc822.Message reads hdr lines
        print 'Bad message number'

def savemessage(i, mailfile, msgList):
    if 1 <= i <= len(msgList):
        open(mailfile, 'a').write('\n' + msgList[i-1] + '-'*80 + '\n')
    else:
        print 'Bad message number'

def msgnum(command):
    try:
        return string.atoi(string.split(command)[1])
    except:
        return -1   # assume this is bad

helptext = """
Available commands:
i     - index display
l n?  - list all messages (or just message n)
d n?  - mark all messages for deletion (or just message n)
s n?  - save all messages to a file (or just message n)
m     - compose and send a new mail message
q     - quit pymail
?     - display this help text
"""

def interact(msgList, mailfile):
    showindex(msgList)
    toDelete = []
    while 1:
        try:
            command = raw_input('[Pymail] Action? (i, l, d, s, m, q, ?) ')
        except EOFError:
            command = 'q'

        # quit
        if not command or command == 'q': 
            break

        # index
        elif command[0] == 'i':          
            showindex(msgList)

        # list
        elif command[0] == 'l':         
            if len(command) == 1:
                for i in range(1, len(msgList)+1): 
                    showmessage(i, msgList)
            else:
                showmessage(msgnum(command), msgList)

        # save
        elif command[0] == 's':        
            if len(command) == 1:
                for i in range(1, len(msgList)+1): 
                    savemessage(i, mailfile, msgList)
            else:
                savemessage(msgnum(command), mailfile, msgList)

        # delete 
        elif command[0] == 'd':               
            if len(command) == 1:
                toDelete = range(1, len(msgList)+1)     # delete all later
            else:
                delnum = msgnum(command)
                if (1 <= delnum <= len(msgList)) and (delnum not in toDelete):
                    toDelete.append(delnum)
                else:
                    print 'Bad message number'

        # mail
        elif command[0] == 'm':                # send a new mail via smtp
            try:                               # reuse existing script
                execfile('smtpmail.py', {})    # run file in own namespace
            except:
                print 'Error - mail not sent'  # don't die if script dies

        elif command[0] == '?':
            print helptext
        else:
            print 'What? -- type "?" for commands help'
    return toDelete

if __name__ == '__main__':
    import sys, getpass, mailconfig
    mailserver = mailconfig.popservername        # ex: 'starship.python.net'
    mailuser   = mailconfig.popusername          # ex: 'lutz'
    mailfile   = mailconfig.savemailfile         # ex:  r'c:\stuff\savemail'
    mailpswd   = getpass.getpass('Password for %s?' % mailserver)

    if sys.platform[:3] == 'win': raw_input()    # clear stream
    print '[Pymail email client]'
    msgList    = loadmessages(mailserver, mailuser, mailpswd)     # load all
    toDelete   = interact(msgList, mailfile)
    if toDelete: deletemessages(mailserver, mailuser, mailpswd, toDelete)
    print 'Bye.'

There isn't much new here -- just a combination of user-interface logic and tools we've already met, plus a handful of new tricks:

Loads

This client loads all email from the server into an in-memory Python list only once, on startup; you must exit and restart to reload newly arrived email.

Saves

On demand, pymail saves the raw text of a selected message into a local file, whose name you place in the mailconfig module.

Deletions

We finally support on-request deletion of mail from the server here: in pymail, mails are selected for deletion by number, but are still only physically removed from your server on exit, and then only if you verify the operation. By deleting only on exit, we avoid changing mail message numbers during a session -- under POP, deleting a mail not at the end of the list decrements the number assigned to all mails following the one deleted. Since mail is cached in memory by pymail, future operations on the numbered messages in memory may be applied to the wrong mail if deletions were done immediately.^[8]

Parsing messages

Pymail still displays the entire raw text of a message on listing commands, but the mail index listing only displays selected headers parsed out of each message. Python's rfc822 module is used to extract headers from a message: the call rfc822.Message(strfile) returns an object with dictionary interfaces for fetching the value of a message header by name string (e.g., index the object on string "From" to get the value of the "From" header line).

Although unused here, anything not consumed from strfile after a Message call is the body of the message, and can be had by calling strfile.read.Message reads the message headers portion only. Notice that strfile is really an instance of the standard StringIO.StringIO object. This object wraps the message's raw text (a simple string) in a file-like interface; rfc822.Message expects a file interface, but doesn't care if the object is a true file or not. Once again, interfaces are what we code to in Python, not specific types. Module StringIO is useful anytime you need to make a string look like a file.

By now, I expect that you know enough Python to read this script for a deeper look, so rather than saying more about its design here, let's jump into an interactive pymail session to see how it works.

Does Anybody Really Know What Time It Is?

Minor caveat: the simple date format used in the smtpmail program (and others in this book) doesn't quite follow the SMTP date formatting standard. Most servers don't care, and will let any sort of date text appear in date header lines. In fact, I've never seen a mail fail due to date formats.

If you want to be more in line with the standard, though, you could format the date header with code like this (adopted from standard module urllib, and parseable with standard tools such as the rfc822 module and the time.strptime call):

import time
gmt = time.gmtime(time.time())
fmt = '%a, %d %b %Y %H:%M:%S GMT'
str = time.strftime(fmt, gmt)
hdr = 'Date: ' + str
print hdr

The hdr variable looks like this when this code is run:

Date: Fri, 02 Jun 2000 16:40:41 GMT

instead of the date format currently used by the smtpmail program:

>>> import time
>>> time.ctime(time.time())
'Fri Jun 02 10:23:51 2000'

The time.strftime call allows arbitrary date and time formatting (time.ctime is just one standard format), but we will leave rooting out the workings of all these calls as a suggested exercise for the reader; consult the time module's library manual entry. We'll also leave placing such code in a reusable file to the more modular among you. Time and date formatting rules are necessary, but aren't pretty.

11.3.3.1 Running the pymail command-line client

Let's start up pymail to read and delete email at our mail server and send new messages. Pymail runs on any machine with Python and sockets, fetches mail from any email server with a POP interface on which you have an account, and sends mail via the SMTP server you've named in the mailconfig module.

Here it is in action running on my Windows 98 laptop machine; its operation is identical on other machines. First, we start the script, supply a POP password (remember, SMTP servers require no password), and wait for the pymail email list index to appear:

C:\...\PP2E\Internet\Email>python pymail.py
Password for pop.rmi.net?

[Pymail email client]
Connecting...
+OK Cubic Circle's v1.31 1998/05/13 POP3 ready <870f000002f56c39@chevalier>
('+OK 5 messages (7150 octets)', ['1 744', '2 642', '3 4456', '4 697', '5 611'],
 36)
There are 5 mail messages in 7150 bytes
Retrieving: 1 2 3 4 5
There are 5 mail messages in 7150 bytes
Retrieving: 1 2 3 4 5
1:      676 bytes
        From=>[email protected]
        Date=>Wed Jul 12 16:03:59 2000
        Subject=>I'm a Lumberjack, and I'm okay
2:      587 bytes
        From=>[email protected]
        Date=>Wed Jul 12 16:06:12 2000
        Subject=>testing
3:      4307 bytes
        From=>"Mark Hammond" <[email protected]>
        Date=>Wed, 12 Jul 2000 18:11:58 -0400
        Subject=>[Python-Dev] Python .NET (was Preventing 1.5 extensions...
4:      623 bytes
        From=>[email protected]
        Date=>Wed Jul 12 16:09:21 2000
        Subject=>A B C D E F G
5:      557 bytes
        From=>[email protected]
        Date=>Wed Jul 12 16:10:55 2000
        Subject=>testing smtpmail
[Press Enter key]
[Pymail] Action? (i, l, d, s, m, q, ?) l 5
--------------------------------------------------------------------------------

Received: by chevalier (mbox lutz)
 (with Cubic Circle's cucipop (v1.31 1998/05/13) Wed Jul 12 16:45:38 2000)
X-From_: [email protected]  Wed Jul 12 16:17:58 2000
Return-Path: <[email protected]>
Received: from VAIO (dial-218.101.denco.rmi.net [166.93.218.101])
        by chevalier.rmi.net (8.9.3/8.9.3) with ESMTP id QAA00415
        for <[email protected]>; Wed, 12 Jul 2000 16:17:57 -0600 (MDT)
Message-Id: <[email protected]>
From: [email protected]
To: [email protected]
Date: Wed Jul 12 16:10:55 2000
Subject: testing smtpmail

Lovely Spam! Wonderful Spam!

--------------------------------------------------------------------------------

[Pymail] Action? (i, l, d, s, m, q, ?) l 4
--------------------------------------------------------------------------------

Received: by chevalier (mbox lutz)
 (with Cubic Circle's cucipop (v1.31 1998/05/13) Wed Jul 12 16:45:38 2000)
X-From_: [email protected]  Wed Jul 12 16:16:31 2000
Return-Path: <[email protected]>
Received: from VAIO (dial-218.101.denco.rmi.net [166.93.218.101])
        by chevalier.rmi.net (8.9.3/8.9.3) with ESMTP id QAA28647
        for <[email protected]>; Wed, 12 Jul 2000 16:16:30 -0600 (MDT)
From: [email protected]
Message-Id: <[email protected]>
To: [email protected]
Date: Wed Jul 12 16:09:21 2000
Subject: A B C D E F G

Fiddle de dum, Fiddle de dee,
Eric the half a bee.

--------------------------------------------------------------------------------

Once pymail downloads your email to a Python list on the local client machine, you type command letters to process it. The "l" command lists (prints) the contents of a given mail number; here, we used it to list the two emails we wrote with the smtpmail script in the last section.

Pymail also lets us get command help, delete messages (deletions actually occur at the server on exit from the program), and save messages away in a local text file whose name is listed in the mailconfig module we saw earlier:

[Pymail] Action? (i, l, d, s, m, q, ?) ?

Available commands:
i     - index display
l n?  - list all messages (or just message n)
d n?  - mark all messages for deletion (or just message n)
s n?  - save all messages to a file (or just message n)
m     - compose and send a new mail message
q     - quit pymail
?     - display this help text

[Pymail] Action? (i, l, d, s, m, q, ?) d 1
[Pymail] Action? (i, l, d, s, m, q, ?) s 4

Now, let's pick the "m" mail compose option -- pymail simply executes the smptmail script we wrote in the prior section and resumes its command loop (why reinvent the wheel?). Because that script sends by SMTP, you can use arbitrary "From" addresses here; but again, you generally shouldn't do that (unless, of course, you're trying to come up with interesting examples for a book).

The smtpmail script is run with the built-in execfile function; if you look at pymail's code closely, you'll notice that it passes an empty dictionary to serve as the script's namespace to prevent its names from clashing with names in pymail code. execfile is a handy way to reuse existing code written as a top-level script, and thus is not really importable. Technically speaking, code in the file smtplib.py would run when imported, but only on the first import (later imports would simply return the loaded module object). Other scripts that check the __name__ attribute for __main__ won't generally run when imported at all:

[Pymail] Action? (i, l, d, s, m, q, ?) m
From? [email protected]
To?   [email protected]
Subj? Among our weapons are these:
Type message text, end with line=(ctrl + D or Z)
Nobody Expects the Spanish Inquisition!
Connecting...
No errors.
Bye.
[Pymail] Action? (i, l, d, s, m, q, ?) q
To be deleted: [1]
Delete?y
Connecting...
+OK Cubic Circle's v1.31 1998/05/13 POP3 ready <8e2e0000aff66c39@chevalier>
Deleting messages from server.
Bye.

As mentioned, deletions really happen only on exit; when we quit pymail with the "q" command, it tells us which messages are queued for deletion, and verifies the request. If verified, pymail finally contacts the mail server again and issues POP calls to delete the selected mail messages.

Because pymail downloads mail from your server into a local Python list only once at startup, though, we need to start pymail again to re-fetch mail from the server if we want to see the result of the mail we sent and the deletion we made. Here, our new mail shows up as number 5, and the original mail assigned number 1 is gone:

C:\...\PP2E\Internet\Email>python pymail.py
Password for pop.rmi.net?

[Pymail email client]
Connecting...
+OK Cubic Circle's v1.31 1998/05/13 POP3 ready <40310000d5f66c39@chevalier>
...
There are 5 mail messages in 7090 bytes
Retrieving: 1 2 3 4 5
1:      587 bytes
        From=>[email protected]
        Date=>Wed Jul 12 16:06:12 2000
        Subject=>testing
2:      4307 bytes
        From=>"Mark Hammond" <[email protected]>
        Date=>Wed, 12 Jul 2000 18:11:58 -0400
        Subject=>[Python-Dev] Python .NET (was Preventing 1.5 extensions...
3:      623 bytes
        From=>[email protected]
        Date=>Wed Jul 12 16:09:21 2000
        Subject=>A B C D E F G
4:      557 bytes
        From=>[email protected]
        Date=>Wed Jul 12 16:10:55 2000
        Subject=>testing smtpmail
5:      615 bytes
        From=>[email protected]
        Date=>Wed Jul 12 16:44:58 2000
        Subject=>Among our weapons are these:
[Press Enter key]
[Pymail] Action? (i, l, d, s, m, q, ?) l 5
--------------------------------------------------------------------------------

Received: by chevalier (mbox lutz)
 (with Cubic Circle's cucipop (v1.31 1998/05/13) Wed Jul 12 16:53:24 2000)
X-From_: [email protected]  Wed Jul 12 16:51:53 2000
Return-Path: <[email protected]>
Received: from VAIO (dial-218.101.denco.rmi.net [166.93.218.101])
        by chevalier.rmi.net (8.9.3/8.9.3) with ESMTP id QAA11127
        for <[email protected]>; Wed, 12 Jul 2000 16:51:52 -0600 (MDT)
From: [email protected]
Message-Id: <[email protected]>
To: [email protected]
Date: Wed Jul 12 16:44:58 2000
Subject: Among our weapons are these:

Nobody Expects the Spanish Inquisition!

--------------------------------------------------------------------------------

[Pymail] Action? (i, l, d, s, m, q, ?) q
Bye.

Finally, here is the mail save file, containing the one message we asked to be saved in the prior session; it's simply the raw text of saved emails, with separator lines. This is both human- and machine-readable -- in principle, another script could load saved mail from this file into a Python list, by calling the string.split function on the file's text with the separator line as a delimiter:

C:\...\PP2E\Internet\Email>type c:\stuff\etc\savemail.txt

Received: by chevalier (mbox lutz)
 (with Cubic Circle's cucipop (v1.31 1998/05/13) Wed Jul 12 16:45:38 2000)
X-From_: [email protected]  Wed Jul 12 16:16:31 2000
Return-Path: <[email protected]>
Received: from VAIO (dial-218.101.denco.rmi.net [166.93.218.101])
        by chevalier.rmi.net (8.9.3/8.9.3) with ESMTP id QAA28647
        for <[email protected]>; Wed, 12 Jul 2000 16:16:30 -0600 (MDT)
From: [email protected]
Message-Id: <[email protected]>
To: [email protected]
Date: Wed Jul 12 16:09:21 2000
Subject: A B C D E F G

Fiddle de dum, Fiddle de dee,
Eric the half a bee.


--------------------------------------------------------------------------------

11.3.4 Decoding Mail Message Attachments

In the last section, we learned how to parse out email message headers and bodies with the rfc822 and StringIO modules. This isn't quite enough for some messages, though. In this section, I will introduce tools that go further, to handle complex information in the bodies of email messages.

One of the drawbacks of stubbornly clinging to a Telnet command-line email interface is that people sometimes send email with all sorts of attached information -- pictures, MS Word files, uuencoded tar files, base64-encoded documents, HTML pages, and even executable scripts that can trash your computer if opened.^[9] Not all attachments are crucial, of course, but email isn't always just ASCII text these days.

Before I overcame my Telnet habits, I needed a way to extract and process all those attachments from a command line (I tried the alternative of simply ignoring all attachments completely, but that works only for a while). Luckily, Python's library tools make handling attachments and common encodings easy and portable. For simplicity, all of the following scripts work on the raw text of a saved email message (or parts of such), but they could just as easily be incorporated into the email programs in this book to extract email components automatically.

11.3.4.1 Decoding base64 data

Let's start with something simple. Mail messages and attachments are frequently sent in an encoding format such as uu or base64; binary data files in particular must be encoded in a textual format for transit using one of these encoding schemes. On the receiving end, such encoded data must first be decoded before it can be viewed, opened, or otherwise used. The Python program in Example 11-19 knows how to perform base64 decoding on data stored in a file.

Example 11-19. PP2E\Internet\Email\decode64.py

#!/usr/bin/env python
#################################################
# Decode mail attachments sent in base64 form.
# This version assumes that the base64 encoded 
# data has been extracted into a separate file.
# It doesn't understand mime headers or parts.
# uudecoding is similar (uu.decode(iname)),
# as is binhex decoding (binhex.hexbin(iname)).
# You can also do this with module mimetools:
# mimetools.decode(input, output, 'base64').
#################################################

import sys, base64

iname = 'part.txt'
oname = 'part.doc'

if len(sys.argv) > 1:
    iname, oname = sys.argv[1:]      # % python prog [iname oname]?

input  = open(iname, 'r')
output = open(oname, 'wb')           # need wb on windows for docs
base64.decode(input, output)         # this does most of the work
print 'done'

There's not much to look at here, because all the low-level translation work happens in the Python base64 module; we simply call its decode method with open input and output files. Other transmission encoding schemes are supported by different Python modules -- uu for uuencoding, binhex for binhex format, and so on. All of these export interfaces that are analogous to base64, and are as easy to use; uu and binhex use the output filename in the data (see the library manual for details).

At a slightly higher level of generality, the mimetools module exports a decode method, which supports all encoding schemes. The desired decoding is given by a passed-in argument, but the net result is the same, as shown in Example 11-20.

Example 11-20. PP2E\Internet\Email\decode64_b.py

#!/usr/bin/env python
#################################################
# Decode mail attachments sent in base64 form.
# This version tests the mimetools module.     
#################################################

import sys, mimetools

iname = 'part.txt'
oname = 'part.doc'

if len(sys.argv) > 1:
    iname, oname = sys.argv[1:]      # % python prog [iname oname]?

input  = open(iname, 'r')
output = open(oname, 'wb')
mimetools.decode(input, output, 'base64')     # or 'uuencode', etc.
print 'done'

To use either of these scripts, you must first extract the base64-encoded data into a text file. Save a mail message in a text file using your favorite email tool, then edit the file to save only the base64-encoded portion with your favorite text editor. Finally, pass the data file to the script, along with a name for the output file where the decoded data will be saved. Here are the base64 decoders at work on a saved data file; the generated output file turns out to be the same as the one saved for an attachment in MS Outlook earlier:

C:\Stuff\Mark\etc\jobs\test>python ..\decode64.py t4.64 t4.doc
done

C:\Stuff\Mark\etc\jobs\test>fc /B cand.agr10.22.doc t4.doc
Comparing files cand.agr10.22.doc and t4.doc
FC: no differences encountered


C:\Stuff\Mark\etc\jobs\test>python ..\decode64_b.py t4.64 t4.doc
done

C:\Stuff\Mark\etc\jobs\test>fc /B cand.agr10.22.doc t4.doc
Comparing files cand.agr10.22.doc and t4.doc
FC: no differences encountered

11.3.4.2 Extracting and decoding all parts of a message

The decoding procedure in the previous section is very manual and error-prone; moreover, it handles only one type of encoding (base64), and decodes only a single component of an email message. With a little extra logic, we can improve on this dramatically with the Python mhlib module's multipart message-decoding tools. For instance, the script in Example 11-21 knows how to extract, decode, and save every component in an email message in one step.

Example 11-21. PP2E\Internet\Email\decodeAll.py

#!/usr/bin/env python
#####################################################
# Decode all mail attachments sent in encoded form:
# base64, uu, etc. To use, copy entire mail message
# to mailfile and run:
#    % python ..\decodeAll.py mailfile
# which makes one or more mailfile.part* outputs.
#####################################################

import sys, mhlib
from types import *
iname = 'mailmessage.txt'

if len(sys.argv) == 3:
    iname, oname = sys.argv[1:]        # % python prog [iname [oname]?]?
elif len(sys.argv) == 2:
    iname = sys.argv[1]
    oname = iname + '.part'

def writeparts(part, oname):
    global partnum
    content = part.getbody()                   # decoded content or list
    if type(content) == ListType:              # multiparts: recur for each
        for subpart in content:
            writeparts(subpart, oname) 
    else:                                      # else single decoded part
        assert type(content) == StringType     # use filename if in headers
        print; print part.getparamnames()      # else make one with counter
        fmode = 'wb'
        fname = part.getparam('name')
        if not fname:
            fmode = 'w'
            fname = oname + str(partnum)
            if part.gettype() == 'text/plain':
                fname = fname + '.txt'
            elif part.gettype() == 'text/html':
                fname = fname + '.html'
        output = open(fname, fmode)            # mode must be 'wb' on windows
        print 'writing:', output.name          # for word doc files, not 'w'
        output.write(content)
        partnum = partnum + 1

partnum = 0
input   = open(iname, 'r')                     # open mail file
message = mhlib.Message('.', 0, input)         # folder, number args ignored
writeparts(message, oname)
print 'done: wrote %s parts' % partnum

Because mhlib recognizes message components, this script processes an entire mail message; there is no need to edit the message to extract components manually. Moreover, the components of an mhlib.Message object represent the already-decoded parts of the mail message -- any necessary uu, base64, and other decoding steps have already been automatically applied to the mail components by the time we fetch them from the object. mhlib is smart enough to determine and perform decoding automatically; it supports all common encoding schemes at once, not just a particular format such as base64.

To use this script, save the raw text of an email message in a local file (using whatever mail tool you like), and pass the file's name on the script's command line. Here the script is extracting and decoding the components of two saved mail message files, t4.eml and t5.eml:

C:\Stuff\Mark\etc\jobs\test>python ..\decodeall.py t4.eml

['charset']
writing: t4.eml.part0.txt

['charset']
writing: t4.eml.part1.html

['name']
writing: cand.agr10.22.doc
done: wrote 3 parts


C:\Stuff\Mark\etc\jobs\test>python ..\decodeall.py t5.eml

['charset']
writing: t5.eml.part0.txt

['name']
writing: US West Letter.doc
done: wrote 2 parts

The end result of decoding a message is a set of one or more local files containing the decoded contents of each part of the message. Because the resulting local files are the crux of this script's purpose, it must assign meaningful names to files it creates. The following naming rules are applied by the script:

If a component has an associated "name" parameter in the message, the script stores the component's bytes in a local file of that name. This generally reuses the file's original name on the machine where the mail originated.
Otherwise, the script generates a unique filename for the component by adding a "partN" suffix to the original mail file's name, and trying to guess a file extension based on the component's file type given in the message.

For instance, the message saved away as t4.eml consists of the message body, an alternative HTML encoding of the message body, and an attached Word doc file. When decoding t4.eml:

The first two message components have no "name" parameter, so the script generates names based on the filename and component types -- t4.eml.part0.txt and t4.eml.part1.html -- plain text and HTML code, respectively. On most machines, clicking on the HTML output file should open it in a web browser for formatted viewing.
The last attachment was given an explicit name when attached -- cand.agr10.22.doc -- so it is used as the output file's name directly. Notice that this was an attached MS Word doc file when sent; assuming all went well in transit, double-clicking on the third output file generated by this script should open it in Word.

There are additional tools in the Python library for decoding data fetched over the Net, but we'll defer to the library manual for further details. Again, using this decoding script still involves some manual intervention -- users must save the mail file and type a command to split off its parts into distinct files -- but it's sufficient for handling multipart mail, and it works portably on any machine with Python. Moreover, the decoding interfaces it demonstrates can be adopted in a more automatic fashion by interactive mail clients.

For instance, the decoded text of a message component could be automatically passed to handler programs (e.g., browsers, text editors, Word) when selected, rather than written to local files. It could also be saved in and automatically opened from local temporary files (on Windows, running a simple DOS start command with os.system would open the temporary file). In fact, popular email tools like Outlook use such schemes to support opening attachments. Python-coded email user interfaces could do so, too -- which is a hint about where this chapter is headed next.

11.4 The PyMailGui Email Client

As a finale for this chapter's email tools coverage, this section presents PyMailGui -- a Python/Tkinter program that implements a client-side email processing user interface. It is presented both as an instance of Python Internet scripting and as an example that ties together other tools we've already seen, such as threads and Tkinter GUIs.

Like the pymail program we wrote earlier, PyMailGui runs entirely on your local computer. Your email is fetched from and sent to remote mail servers over sockets, but the program and its user interface run locally. Because of that, PyMailGui is called an email client : it employs Python's client-side tools to talk to mail servers from the local machine. In fact, in some ways, PyMailGui builds on top of pymail to add a GUI. Unlike pymail, though, PyMailGui is a fairly full-featured user interface: email operations are performed with point-and-click operations.

11.4.1 Why PyMailGui?

Like many examples presented in this text, PyMailGui is also a practical, useful program. In fact, I run it on all kinds of machines to check my email while traveling around the world teaching Python classes (it's another workaround for Telnet-challenged ISPs). Although PyMailGui won't put Microsoft Outlook out of business anytime soon, I like it for two reasons:

It's portable

PyMailGui runs on any machine with sockets and a Python with Tkinter installed. Because email is transferred with the Python libraries, any Internet connection will do. Moreover, because the user interface is coded with the Tkinter extension, PyMailGui should work, unchanged, on Windows, the X Windows system (Unix, Linux), and the Macintosh.

Microsoft Outlook is a more feature-rich package, but it has to be run on Windows, and more specifically, on a single Windows machine. Because it generally deletes email from a server as it is downloaded and stores it on the client, you cannot run Outlook on multiple machines without spreading your email across all those machines. By contrast, PyMailGui saves and deletes email only on request, and so it is a bit more friendly to people who check their email in an ad-hoc fashion on arbitrary computers.

It's scriptable

PyMailGui can become anything you want it to be, because it is fully programmable. In fact, this is the real killer feature of PyMailGui and of open source software like Python in general -- because you have full access to PyMailGui's source code, you are in complete control of where it evolves from here. You have nowhere near as much control over commercial, closed products like Outlook; you generally get whatever a large company decided you need, along with whatever bugs that company might have introduced.

As a Python script, PyMailGui is a much more flexible tool. For instance, I can change its layout, disable features, and add completely new functionality quickly, by changing its Python source code. Don't like the mail list display? Change a few lines of code to customize it. Want to save and delete your mail automatically as it is loaded? Add some more code and buttons. Tired of seeing junk mail? Add a few lines of text-processing code to the load function to filter spam. These are just a few examples. The point is that because PyMailGui is written in a high-level, easy-to-maintain scripting language, such customizations are relatively simple, and might even be a lot of fun.^[10]

It's also worth mentioning that PyMailGui achieves this portability and scriptability, and implements a full-featured email interface along the way, in roughly 500 lines of program code. It doesn't have as many bells and whistles as commercial products, but the fact that it gets as close as it does in so few lines of code is a testament to the power of both the Python language and its libraries.

11.4.2 Running PyMailGui

Of course, to script PyMailGui on your own, you'll need to be able to run it. PyMailGui only requires a computer with some sort of Internet connectivity (a PC with a dialup account and modem will do) and an installed Python with the Tkinter extension enabled. The Windows port of Python has this capability, so Windows PC users should be able to run this program immediately by clicking its icon (the Windows port self-installer is on this book's CD (see http://examples.oreilly.com/python2) and also at http://www.python.org). You'll also want to change the file mailconfig.py in the email examples directory to reflect your account's parameters; more on this as we interact with the system.

11.4.3 Presentation Strategy

PyMailGui is easily one of the longest programs in this book (its main script is some 500 lines long, counting blank lines and comments), but it doesn't introduce many library interfaces that we haven't already seen in this book. For instance:

The PyMailGui interface is built with Python's Tkinter, using the familiar listboxes, buttons, and text widgets we met earlier.
Python's rfc822 email header parser module is applied to pull out headers and text of messages.
Python's POP and SMTP library modules are used to fetch, send, and delete mail over sockets.
Python threads, if installed in your Python interpreter, are put to work to avoid blocking during long-running mail operations (loads, sends, deletions).

We're also going to reuse the TextEditor object we wrote in Chapter 9, to view and compose messages, the simple pymail module's tools we wrote earlier in this chapter to load and delete mail from the server, and the mailconfig module of this chapter to fetch email parameters. PyMailGui is largely an exercise in combining existing tools.

On the other hand, because this program is so long, we won't exhaustively document all of its code. Instead, we'll begin by describing how PyMailGui works from an end-user's perspective. After that, we'll list the system's new source code modules without any additional comments, for further study.

Like most longer case studies in this book, this section assumes that you already know enough Python to make sense of the code on your own. If you've been reading this book linearly, you should also know enough about Tkinter, threads, and mail interfaces to understand the library tools applied here. If you get stuck, you may wish to brush-up on the presentation of these topics earlier in the book.

Open Source Software and Camaros

An analogy might help underscore the importance of PyMailGui's scriptability. There are still a few of us who remember a time when it was completely normal for car owners to work on and repair their own automobiles. I still fondly remember huddling with friends under the hood of a 1970 Camaro in my youth, tweaking and customizing its engine. With a little work, we could make it as fast, flashy, and loud as we liked. Moreover, a breakdown in one of those older cars wasn't necessarily the end of the world. There was at least some chance that I could get the car going again on my own.

That's not quite true today. With the introduction of electronic controls and diabolically cramped engine compartments, car owners are usually better off taking their cars back to the dealer or other repair professional for all but the simplest kinds of changes. By and large, cars are no longer user-maintainable products. And if I have a breakdown in my shiny new Jeep, I'm probably going to be completely stuck until an authorized repairperson can get around to towing and fixing my ride.

I like to think of the closed and open software models in the same terms. When I use Microsoft Outlook, I'm stuck both with the feature set that a large company dictates, as well as any bugs that it may harbor. But with a programmable tool like PyMailGui, I can still get under the hood. I can add features, customize the system, and work my way out of any lurking bugs. And I can do so long before the next Outlook patch or release.

At the end of the day, open source software is about freedom. Users, not an arbitrarily far-removed company, have the final say. Not everyone wants to work on his or her own car, of course. On the other hand, software tends to fail much more often than cars, and Python scripting is considerably less greasy than auto mechanics.

11.4.4 Interacting with PyMailGui

To make this case study easier to understand, let's begin by seeing what PyMailGui actually does -- its user interaction and email processing functionality -- before jumping into the Python code that implements that behavior. As you read this part, feel free to jump ahead to the code listings that appear after the screen shots, but be sure to read this section, too; this is where I will explain all the subtleties of PyMailGui's design. After this section, you are invited to study the system's Python source code listings on your own for a better and more complete explanation than can be crafted in English.

11.4.4.1 Getting started

PyMailGui is a Python/Tkinter program, run by executing its top-level script file, PyMailGui.py. Like other Python programs, PyMailGui can be started from the system command line, by clicking on its filename icon in a file explorer interface, or by pressing its button in the PyDemos or PyGadgets launcher bars. However it is started, the first window PyMailGui presents is shown in Figure 11-10.

Figure 11-10. PyMailGui main window start

figs/ppy2_1110.gif

This is the PyMailGui main window -- every operation starts here. It consists of:

A help button (the light blue bar at the top)
A clickable email list area for fetched emails (the middle white section)
A button bar at the bottom for processing messages selected in the list area

In normal operation, users load their email, select an email from the list area by clicking on it, and press a button at the bottom to process it. No mail messages are shown initially; we need to first load them, as we'll see in a moment. Before we do, though, let's press the blue help bar at the top to see what sort of help is available; Figure 11-11 shows the help window pop-up that appears.

Figure 11-11. PyMailGui help pop-up

figs/ppy2_1111.gif

The main part of this window is simply a block of text in a scrolled-text widget, along with two buttons at the bottom. The entire help text is coded as a single triple-quoted string in the Python program. We could get more fancy and spawn a web browser to view HTML-formatted help, but simple text does the job here.^[11] The Cancel button makes this nonmodal (i.e., nonblocking) window go away; more interestingly, the Source button pops up a viewer window with the source code of PyMailGui's main script, as shown in Figure 11-12.

Figure 11-12. PyMailGui source code viewer window

figs/ppy2_1112.gif

Not every program shows you its source code, but PyMailGui follows Python's open source motif. Politics aside, the main point of interest here is that this source viewer window is the same as PyMailGui's email viewer window. All the information here comes from PyMailGui internally; but this same window is used to display and edit mail shipped across the Net, so let's look at its format here:

The top portion consists of a Cancel button to remove this nonmodal window, along with a section for displaying email header lines -- "From:", "To:", and so on.
The bulk of this window is just another reuse of the TextEditor class object we wrote earlier in the book for the PyEdit program -- PyMailGui simply attaches an instance of TextEditor to every view and compose window, to get a full-featured text editor component for free. In fact, everything but the Cancel button and header lines on this window are implemented by TextEditor, not PyMailGui.

For instance, if we pick the Tools menu of the text portion of this window, and select its Info entry, we get the standard PyEdit TextEditor object's file text statistics box -- the exact same pop-up we'd get in the standalone PyEdit text editor, and the PyView image view programs we wrote in Chapter 9 (see Figure 11-13).

In fact, this is the third reuse of TextEditor in this book: PyEdit, PyView, and now PyMaiGui all present the same text editing interface to users, simply because they all use the same TextEditor object. For purposes of showing source code, we could also simply spawn the PyEdit program with the source file's name as a command-line argument (see PyEdit earlier in the text for more details). PyMailGui attaches an instance instead.

Figure 11-13. PyMailGui attached TextEditor info box

figs/ppy2_1113.gif

To display email, PyMailGui inserts its text into an attached TextEditor object; to compose mail, PyMailGui presents a TextEditor and later fetches all its text out to ship over the Net. Besides the obvious simplification here, this code reuse also makes it easy to pick up improvements and fixes -- any changes in the TextEditor object are automatically inherited by PyMailGui, PyView, and PyEdit.

11.4.4.2 Loading mail

Now, let's go back to the PyMailGui main window, and click the Load button to retrieve incoming email over the POP protocol. Like pymail, PyMailGui's load function gets account parameters from the mailconfig module listed in Example 11-15, so be sure to change this file to reflect your email account parameters (i.e., server names and usernames) if you wish to use PyMailGui to read your own email.

The account password parameter merits a few extra words. In PyMailGui, it may come from one of two places:

Local file: If you put the name of a local file containing the password in the mailconfig module, PyMailGui loads the password from that file as needed.
Pop-up dialog: If you don't put a password filename in mailconfig (or PyMailGui can't load it from the file for whatever reason), PyMailGui will instead ask you for your password any time it is needed.

Figure 11-14 shows the password input prompt you get if you haven't stored your password in a local file. Note that the password you type is not shown -- a show='*' option for the Entry field used in this pop-up tells Tkinter to echo typed characters as stars (this option is similar in spirit to both the getpass console input module we met earlier in this chapter, and an HTML type=password option we'll meet in a later chapter). Once entered, the password lives only in memory on your machine; PyMailGui itself doesn't store it anywhere in a permanent way.

Also notice that the local file password option requires you to store your password unencrypted in a file on the local client computer. This is convenient (you don't need to retype a password every time you check email), but not generally a good idea on a machine you share with others; leave this setting blank in mailconfig if you prefer to always enter your password in a pop-up.

Figure 11-14. PyMailGui password input dialog

Once PyMailGui fetches your mail parameters and somehow obtains your password, it will next attempt to pull down all your incoming email from your POP server. PyMailGui reuses the load-mail tools in the pymail module listed in Example 11-18, which in turn uses Python's standard poplib module to retrieve your email.

11.4.4.3 Threading long-running email transfers

Ultimately, though, the load function must download your email over a socket. If you get as much email as I do, this can take awhile. Rather than blocking the GUI while the load is in progress, PyMailGui spawns a thread to do the mail download operation in parallel with the rest of the program. The main program continues responding to window events (e.g., redrawing the display after another window has been moved over it) while your email is being downloaded. To let you know that a download is in progress in the background, PyMailGui pops up the wait dialog box shown in Figure 11-15.

Figure 11-15. PyMailGui load mail wait box (thread running)

figs/ppy2_1115.gif

This dialog grabs focus and thus effectively disables the rest of the GUI's buttons while a download is in progress. It stays up for the duration of the download, and goes away automatically when the download is complete. Similar wait pop-ups appear during other long-running socket operations (email send and delete operations), but the GUI itself stays alive because the operations run in a thread.

On systems without threads, PyMailGui instead goes into a blocked state during such long-running operations (it stubs out the thread spawn operation to perform a simple function call). Because the GUI is essentially dead without threads, covering and uncovering the GUI during a mail load on such platforms will erase or otherwise distort its contents.^[12] Threads are enabled by default on most platforms that Python runs on (including Windows), so you probably won't see such oddness on your machine.

One note here: as mentioned when we met the FTP GUIs earlier in this chapter, on MS Windows, only the thread that creates windows can process them. Because of that, PyMailGui takes care to not do anything related to the user interface within threads that load, send, or delete email. Instead, the main program continues responding to user interface events and updates, and watches for a global "I'm finished" flag to be set by the email transfer threads. Recall that threads share global (i.e., module) memory; since there is at most only two threads active in PyMailGui at once -- the main program and an email transfer thread -- a single global flag is all the cross-thread communication mechanism we need.

11.4.4.4 Load server interface

Because the load operation is really a socket operation, PyMailGui will automatically connect to your email server using whatever connectivity exists on the machine on which it is run. For instance, if you connect to the Net over a modem, and you're not already connected, Windows automatically pops up the standard connection dialog; Figure 11-16 shows the one I get on my laptop. If PyMailGui runs on a machine with a dedicated Internet link, it uses that instead.

Figure 11-16. PyMailGui connection dialog (Windows)

figs/ppy2_1116.gif

After PyMailGui finishes loading your email, it populates the main window's list box with all of the messages on your email server, and scrolls to the most recently received. Figure 11-17 shows what the main windows looks like on my machine.

Figure 11-17. PyMailGui main window after load

figs/ppy2_1117.gif

Technically, the Load button fetches all your mail the first time it is pressed, but fetches only newly arrived email on later presses. PyMailGui keeps track of the last email loaded, and requests only higher email numbers on later loads. Already-loaded mail is kept in memory, in a Python list, to avoid the cost of downloading it again. Like the simple pymail command-line interface shown earlier, PyMailGui does not delete email from your server when it is loaded; if you really want to not see an email on a later load, you must explicitly delete it (more on this later).

Like most GUIs in this book, the main window can be resized; Figure 11-18 shows what it looks like when stretched to reveal more email details. Entries in the main list show just enough to give the user an idea of what the message contains -- each entry gives the concatenation of portions of the message's "Subject:", "From:", and "Date:" header lines, separated by | characters, and prefixed with the message's POP number (e.g., there are 91 emails in this list). The columns don't always line up neatly (some headers are shorter than others), but it's enough to hint at the message's contents.

Figure 11-18. PyMailGui main window resized

figs/ppy2_1118.gif

As we've seen, much magic happens when downloading email -- the client (the machine on which PyMailGui runs) must connect to the server (your email account machine) over a socket, and transfer bytes over arbitrary Internet links. If things go wrong, PyMailGui pops up standard error dialog boxes to let you know what happened. For example, if PyMailGui cannot establish a connection at all, you'll get a window like that shown in Figure 11-19.

Figure 11-19. PyMailGui connection error box

figs/ppy2_1119.gif

The details displayed here are just the Python exception type and exception data. If you typed an incorrect username or password for your account (in the mailconfig module or in the password pop-up), you'll see the message in Figure 11-20.

Figure 11-20. PyMailGui invalid password error box

figs/ppy2_1120.gif

This box shows the exception raised by the Python poplib module. If PyMailGui cannot contact your server (e.g., it's down, or you listed its name wrong in mailconfig), you'll get the pop-up shown in Figure 11-21.

Figure 11-21. PyMailGui invalid or down server error box

figs/ppy2_1121.gif

11.4.4.5 Sending email

Once we've loaded email, we can process our messages with buttons on the main window. We can, however, send new emails at any time, even before a load. Pressing the Write button on the main window generates a mail composition window; one has been captured in Figure 11-22.

Figure 11-22. PyMailGui write mail compose window

figs/ppy2_1122.gif

This window is just like the help source code viewer we saw a moment ago -- it has fields for entering header line details, and an attached TextEditor object for writing the body of the new email. For write operations, PyMailGui automatically fills the "From" line and inserts a signature text line (" -- Mark..."), from your mailconfig module settings. You can change these to any text you like, but the defaults are filled in automatically from your mailconfig.

There is also a new "Send" button here: when pressed, the text you typed into the the body of this window is mailed to the addresses you typed into the "To" and "Cc" lines, using Python's smtplib module. PyMailGui adds the header fields you type as mail header lines in the sent message. To send to more than one address, separate them with a ";" in the "To" and "Cc" lines (we'll see an example of this in a moment). In this mail, I fill in the "To" header with my own email address, to send the message to myself for illustration purposes.

As we've seen, smtplib ultimately sends bytes to a server over a socket. Since this can be a long-running operation, PyMailGui delegates this operation to a spawned thread, too. While the send thread runs, the wait window in Figure 11-23 appears, and the entire GUI stays alive; redraw and move events are handled in the main program thread, while the send thread talks to the SMTP server.

Figure 11-23. PyMailGui send mail wait box (thread running)

figs/ppy2_1123.gif

You'll get an error pop-up if Python cannot send a message to any of the target recipients, for any reason. If you don't get an error pop-up, everything worked correctly, and your mail will show up in the recipients' mailboxes on their email servers. Since I sent the message above to myself, it shows up in mine the next time I press the main window's Load button, as we see in Figure 11-24.

Figure 11-24. PyMailGui main window after sends, load

figs/ppy2_1124.gif

If you look back to the last main window shot, you'll notice that there are only two new emails now -- numbers 92 (from Python-Help) and 93 (the one I just wrote); PyMailGui is smart enough to download only the two new massages, and tack them onto the end of the loaded email list.

11.4.4.6 Viewing email

Now, let's view the mail message that was sent and received. PyMailGui lets us view email in formatted or raw modes. First, highlight (single-click) the mail you want to see in the main window, and press the View button. A formatted mail viewer window like that shown in Figure 11-25 appears.

Figure 11-25. PyMailGui view incoming mail window

figs/ppy2_1125.gif

This is the exact same window we saw displaying source code earlier, only now all the information is filled in by extracting bits of the selected email message. Python's rfc822 module is used to parse out header lines from the raw text of the email message; their text is placed into the fields in the top right of the window. After headers are parsed, the message's body text is left behind (in a StringIO file-like string wrapper), and is read and stuffed into a new TextEditor object for display (the white part in the middle of the window).

Besides the nicely formatted view window, PyMailGui also lets us see the raw text of a mail message. Double-click on a message's entry in the main window's list to bring up a simple unformatted display of the mail's text. The raw version of the mail I sent to myself is shown in Figure 11-26.

Figure 11-26. PyMailGui raw mail text view window

figs/ppy2_1126.gif

This raw text display can be useful to see special mail headers not shown in the formatted view. For instance, the optional "X-Mailer:" header in the raw text display identifies the program that transmitted a message; PyMailGui adds it automatically, along with standard headers like "From:" and "To:". Other headers are added as the mail is transmitted: the "Received:" headers name machines that the message was routed through on its way to our email server.

And really, the raw text form is all there is to an email message -- it's what is transferred from machine to machine when mail is sent. The nicely formatted display simply parses out pieces of the mail's raw text with standard Python tools and places them in associated fields of the display of Figure 11-25.

11.4.4.7 Email replies and forwards

Besides allowing reading and writing email, PyMailGui also lets users forward and reply to incoming email sent from others. To reply to an email, select its entry in the main window's list and click the Reply button. If I reply to the mail I just sent to myself (arguably narcissistic, but demonstrative), the mail composition window shown in Figure 11-27 appears.

Figure 11-27. PyMailGui reply compose window

figs/ppy2_1127.gif

This window is identical in format to the one we saw for the "Write" operation, except that PyMailGui fills in some parts automatically:

The "From" line is set to your email address in your mailconfig module.
The "To" line is initialized to the original message's "From" address (we're replying to the original sender, after all). See the sidebar "More on Reply Addresses" for additional details on the target address.
The "Subject" line is set to the original message's subject line prepended with a "Re:", the standard follow-up subject line form.
The body of the reply is initialized with the signature line in mailconfig, along with the original mail message's text. The original message text is quoted with > characters and prepended with a few header lines extracted from the original message to give some context.

Luckily, all of this is much easier than it may sound. Python's standard rfc822 module is used to extract all the original message's header lines, and a single string.replace call does the work of adding the > quotes to the original message body. I simply type what I wish to say in reply (the initial paragraph in the mail's text area), and press the Send button to route the reply message to the mailbox on my mail server again. Physically sending the reply works the same as sending a brand new message -- the mail is routed to your SMTP server in a spawned send mail thread, and the send mail wait pop-up appears.

Forwarding a message is similar to replying: select the message in the main window, press the "Fwd" button, and fill in the fields and text area of the popped-up composition window. Figure 11-28 shows the window created to forward the mail we originally wrote and received.

Figure 11-28. PyMailGui forward compose window

figs/ppy2_1128.gif

Much like replies, "From" is filled from mailconfig, the original text is automatically quoted in the message body again, and the subject line is preset to the original message's subject prepended with the string "Fwd:". I have to fill in the "To" line manually, though, because this is not a direct reply (it doesn't necessarily go back to the original sender). Notice that I'm forwarding this message to two different addresses; multiple recipient addresses are separated with a ";" character in "To" and "Cc" header fields. The Send button in this window fires the forward message off to all addresses listed in "To" and "Cc".

Figure 11-29. PyMailGui mail list after sends and load

figs/ppy2_1129.gif

Okay, I've now written a new message, and replied to and forwarded it. The reply and forward were sent to my email address, too; if we press the main window's Load button again, the reply and forward messages should show up in the main window's list. In Figure 11-29, they appear as messages 94 and 95.

Keep in mind that PyMailGui runs on the local computer, but the messages you see in the main window's list actually live in a mailbox on your email server machine. Every time we press Load, PyMailGui downloads but does not delete newly arrived email from the server to your computer. The three messages we just wrote (93-95) will also appear in any other email program you use on your account (e.g., in Outlook). PyMailGui does not delete messages as they are downloaded, but simply stores them in your computer's memory for processing. If we now select message 95 and press View, we see the forward message we sent, as in Figure 11-30. Really, this message went from my machine to a remote email server, and was downloaded from there into a Python list from which it is displayed.

Figure 11-30. PyMailGui view forwarded mail

figs/ppy2_1130.gif

Figure 11-31 shows what the forward message's raw text looks like; again, double-click on a main window's entry to display this form. The formatted display in Figure 11-30 simply extracts bits and pieces out of the text shown in the raw display form.

Figure 11-31. PyMailGui view forwarded mail, raw

figs/ppy2_1131.gif

11.4.4.8 Saving and deleting email

So far, we've covered everything except two of the main window's processing buttons and the All checkbox. PyMailGui lets us save mail messages in local text files, and delete messages from the server permanently, such that we won't see them the next time we access our account. Moreover, we can save and delete a single mail at a time, or all mails displayed in the main windows list:

To save one email, select it in the main window's list and press the Save button.
To save all the emails in the list in one step, click the All checkbox at the bottom right corner of the main window and then press Save.

More on Reply Addresses

A subtle thing: technically, the "To" address in replies is made from whatever we get back from a standard library call of the form hdrs.getaddr('From') -- an rfc822 module interface that parses and formats the original message sender's address automatically -- plus quotes added in a few rare cases.

Refer to function onReplyMail in the code listings. This library call returns a pair ( full name, email address) parsed from the mail's "From:" header line. For instance, if a mail's first "From:" header contains the string:

'[email protected] (Joe Blow)'

then a call hdrs.getaddr('From') will yield the pair ('Joe Blow', '[email protected]'), with an empty name string if none exists in the original sender address string. If the header contains:

'Joe Blow <[email protected]>'

instead, the call yields the exact same result tuple.

Unfortunately, though, the Python 1.5.2 rfc822 module had a bug that makes this call not always correct: the getaddr function yields bogus results if a full name part of the address contains a comma (e.g., "Blow, Joe"). This bug may be fixed in Python 2.0, but to work around it for earlier releases, PyMailGui puts the name part in explicit quotes if it contains a comma, before stuffing it into the target full-name <email-address> address used in the "To:" line of replies. For example, here are four typical "From" addresses and the reply "To" address PyMailGui generates for each (after the =>):

[email protected]  => <[email protected]>
Joe Blow <[email protected]>    => Joe Blow <[email protected]>
[email protected] (Joe Blow)    => Joe Blow <[email protected]> 
"Blow, Joe" <[email protected]> => "Blow, Joe" <[email protected]>

Without the added quotes around the name in the last of these, the comma would confuse my SMTP server into seeing two recipients -- [email protected] and Joe <[email protected] > (the first incorrectly gets my ISP's domain name added because it is assumed to be a local user). The added quotes won't hurt if the bug is removed in later releases.

A less complex alternative solution (and one we'll use in a program called PyMailCgi later in this book) is to simply use the original "From" address exactly as the reply's "To". A library call of the form hdrs.get('From') would return the sender's address verbatim, quotes and all, without trying to parse out its components at all.

As coded, the PyMailGui reply address scheme works on every message I've ever replied to, but may need to be tweaked for some unique address formats or future Python releases. I've tested and used this program a lot, but much can happen on the Net, despite mail address standards. Officially speaking, any remaining bugs you find in it are really suggested exercises in disguise (at least I didn't say they were "features").

Delete operations are kicked off the same way, but press the Del button instead. In typical operation, I eventually delete email I'm not interested in, and save and delete emails that are important. Save operations write the raw text of one or more emails to a local text file you pick in the pop-up dialog shown in Figure 11-32.

Figure 11-32. PyMailGui save mail dialog

figs/ppy2_1132.gif

Technically, saves always append raw message text to the chosen file; the file is opened in 'a' mode, which creates the file if needed, and writes at its end. The save operation is also smart enough to remember the last directory you selected; the file dialog begins navigation there the next time you press Save.

Delete operations can also be applied to one or all messages. Unlike other operations, though, delete requests are simply queued up for later action. Messages are actually deleted from your mail server only as PyMailGui is exiting. For instance, if we've selected some messages for deletion and press the main window's Quit button, a standard verification dialog appears (Figure 11-33).

Figure 11-33. PyMailGui quit verification

figs/ppy2_1133.gif

If we then verify the quit request, a second dialog appears (Figure 11-34), asking us to verify deletion of the queued up messages. If we press No here, no deletes happen, and PyMailGui silently exits. If we select Yes, PyMailGui spawns one last thread to send deletion requests to the email server for all the emails selected for deletion during the session. Another wait-state pop-up appears while the delete thread is running; when that thread is finished, PyMailGui exits as well.

Figure 11-34. PyMailGui delete verification on quit

figs/ppy2_1134.gif

By default and design, no mail is ever removed: you will see the same messages the next time PyMailGui runs. It deletes mail from your server only when you ask it to, deletes messages only on exit, and then only if verified in the last pop-up shown (this is your last chance to prevent permanent mail removal).

11.4.4.9 POP message numbers

This may seem a roundabout way to delete mail, but it accommodates a property of the POP interface. POP assigns each message a sequential number, starting from 1, and these numbers are passed to the server to fetch and delete messages. It's okay if new mail arrives while we're displaying the result of a prior download -- the new mail is assigned higher numbers, beyond what is displayed on the client. But if we delete a message in the middle of a mailbox, the numbers of all messages after the one deleted change (they are decremented by one). That means that some message numbers may be no longer valid if deletions are made while viewing previously loaded email (deleting by some number N may really delete message N+1!).

PyMailGui could adjust all the displayed numbers to work around this. To keep things simple, though, it postpones deletions instead. Notice that if you run multiple instances of PyMailGui at once, you shouldn't delete in one and then another because message numbers may become confused. You also may not be happy with the results of running something like Outlook at the same time as a PyMailGui session, but the net effect of such a combination depends on how another mail client handles deletions. In principle, PyMailGui could be extended to prevent other instances from running at the same time, but we leave that as an exercise.

11.4.4.10 Windows and status messages

Before we close this section, I want to point out that PyMailGui is really meant to be a multiple-window interface -- something not made obvious by the earlier screen shots. For example, Figure 11-35 shows PyMailGui with a main list box, help, and three mail view windows. All these windows are nonmodal; that is, they are all active and independent, and do not block other windows from being selected. This interface all looks slightly different on Linux, but has the same functionality.

Figure 11-35. PyMailGui multiple windows and text editors

figs/ppy2_1135.gif

In general, you can have any number of mail view or compose windows up at once, and cut and paste between them. This matters, because PyMailGui must take care to make sure that each window has a distinct text editor object. If the text editor object was a global, or used globals internally, you'd likely see the same text in each window (and the send operations might wind up sending text from another window). To avoid this, PyMailGui creates and attaches a new TextEditor instance to each view and compose window it creates, and associates the new editor with the Send button's callback handler to make sure we get the right text.

Finally, PyMailGui prints a variety of status messages as it runs, but you see them only if you launch the program from the system command line (e.g., a DOS box on Windows or an xterm on Linux), or by double-clicking on its filename icon (its main script is a .py, not a .pyw). On Windows, you won't see these message when it is started from another program, such as the PyDemos or PyGadgets launcher bar GUIs. These status messages print server information, show mail loading status, and trace the load, store, and delete threads that are spawned along the way. If you want PyMailGui to be more verbose, launch it from a command line and watch:

C:\...\PP2E\Internet\Email>python PyMailGui.py
load start
Connecting...
+OK Cubic Circle's v1.31 1998/05/13 POP3 ready <594100005a655e39@chevalier>
('+OK 5 messages (8470 octets)', ['1 709', '2 1136', '3 998', '4 2679', 
'5 2948'], 38)
There are 5 mail messages in 8470 bytes
Retrieving: 1 2 3 4 5
load exit
thread exit caught
send start
Connecting to mail... ['<[email protected]>']
send exit
thread exit caught

You can also double-click on the PyMailGui.py filename in your file explorer GUI and monitor the popped-up DOS console box on Windows; Figure 11-36 captures this window in action.

Figure 11-36. PyMailGui console status message window

figs/ppy2_1136.gif

PyMailGui status messages display the mail currently being downloaded (i.e., the "Retrieving:" lines are appended with a new mail number as each message is downloaded), and so give a more informative download status indicator than the wait pop-up window.

11.4.5 Implementing PyMailGui

Last but not least, we get to the code. There are really only two new modules here: one where the help text is stored and another that implements the system.

In fact, PyMailGui gets a lot of mileage out of reusing modules we wrote earlier and won't repeat here: pymail for mail load and delete operations, mailconfig for mail parameters, the GUI section's TextEditor for displaying and editing mail message text, and so on. In addition, standard Python modules used here such as poplib, smtplib, and rfc822 hide most of the details of pushing bytes around the Net and extracting message components. Tkinter implements GUI components in a portable fashion.

11.4.5.1 Help text module

The net effect of all this reuse is that PyMailGui implements a fairly feature-rich email program in roughly 500 lines of code, plus one support module. Example 11-22 shows the support module -- used only to define the help text string, to avoid cluttering the main script file.

Example 11-22. PP2E\Internet\Email\PyMailGuiHelp.py

#####################################################################
# PyMailGui help text string, in this seperate module only to avoid
# distracting from executable code.  As coded, we throw up this text
# in a simple scrollable text box; in the future, we might instead 
# use an HTML file opened under a web browser (e.g., run a "netscape
# help.html" or DOS "start help.html" command using os.system call
#####################################################################

# must be narrow for Linux info box popups; 
# now uses scrolledtext with buttons instead;

helptext = """
PyMail, version 1.0
February, 2000
Programming Python, 2nd Edition
O'Reilly & Associates

Click main window buttons to process email: 
- Load:\t fetch all (or newly arrived) POP mail from server
- View:\t display selected message nicely formatted
- Save:\t write selected (or all) emails to a chosen file
- Del:\t mark selected (or all) email to be deleted on exit
- Write:\t compose a new email message, send it by SMTP
- Reply:\t compose a reply to selected email, send it by SMTP
- Fwd:\t compose a forward of selected email, send by SMTP
- Quit:\t exit PyMail, delete any marked emails from server 

Click an email in the main window's listbox to select it.
Click the "All" checkbox to apply Save or Del buttons to 
all retrieved emails in the list.  Double-click on an email
in the main window's listbox to view the mail's raw text, 
including mail headers not shown by the View button.

Mail is removed from POP servers on exit only, and only mails
marked for deletion with the Del button are removed, if and 
only if you verify the deletion in a confirmation popup.  

Change the mailconfig.py module file on your own machine to 
reflect your email server names, user name, email address, 
and optional mail signature line added to all composed mails.
Miscellaneous hints:

- Passwords are requested if needed, and not stored by PyMail.
- You may put your password in a file named in mailconfig.py.
- Use ';' between multiple addresses in "To" and "Cc" headers. 
- Reply and Fwd automatically quote the original mail text.  
- Save pops up a dialog for selecting a file to hold saved mails.
- Load only fetches newly-arrived email after the first load.

This client-side program currently requires Python and Tkinter.
It uses Python threads, if installed, to avoid blocking the GUI.
Sending and loading email requires an Internet connection.
"""

if __name__ == '__main__': 
    print helptext                    # to stdout if run alone
    raw_input('Press Enter key')      # pause in DOS console popups

11.4.5.2 Main module

And finally, here is the main PyMailGui script -- the file run to start the system (see Example 11-23). I've already told what it does and why, so studying this listing's code and its comments for a deeper look is left as a suggested exercise. Python is so close to pseudocode already, that additional narrative here would probably be redundant.

Although I use this example on a daily basis as is, it is also prime for extension. For instance:

Deleted messages could be marked as such graphically.
Email attachments could be displayed, parsed out, decoded, and opened automatically when clicked using the Python multipart email extraction tools we met earlier in this chapter.
Download status could be made more informative during mail load operations by updating a progress bar after each fetch (e.g., by periodically reconfiguring the size of a rectangle drawn on a popped-up canvas).
Hyperlink URLs within messages could be highlighted visually and made to spawn a web browser automatically when clicked, by using the launcher tools we met in the GUI and system tools parts of this book.
Because Internet newsgroup posts are similar in structure to emails (header lines plus body text: see the nntplib example in the next section), this script could in principle be extended to display both email messages and news articles. Classifying such a possible mutation as clever generalization or diabolical hack is left as an exercise in itself.
This script still uses the nonstandard but usually harmless sending date format discussed in an earlier sidebar in this chapter; it would be trivial to import a conforming date format function from the pymail module.
PyMailGui displays a wait dialog box during mail transfers that effectively disables the rest of the GUI. This is by design, to minimize timing complexity. In principle, though, the system could allow mail operation threads to overlap in time (e.g., allow the user to send new messages while a download is in progress). Since each transfer runs on a socket of its own, PyMailGui need not block other operations during transfers. This might be implemented with periodic Tkinter after events that check the status of transfers in progress. See the PyFtpGui scripts earlier in this chapter for an example of overlapping transfer threads.

And so on; because this software is open source, it is also necessarily open-ended. Suggested exercises in this category are delegated to your imagination.

Example 11-23. PP2E\Internet\Email\PyMailGui.py

######################################################
# PyMailGui 1.0 - A Python/Tkinter email client.
# Adds a Tkinter-based GUI interface to the pymail
# script's functionality.  Works for POP/SMTP based
# email accounts using sockets on the machine on 
# which this script is run.  Uses threads if 
# installed to run loads, sends, and deletes with
# no blocking; threads are standard on Windows.
# GUI updates done in main thread only (Windows).
# Reuses and attaches TextEditor class object.
# Run from command-line to see status messages.
# See use notes in help text in PyMailGuiHelp.py.
# To do: support attachments, shade deletions.
######################################################

# get services
import pymail, mailconfig
import rfc822, StringIO, string, sys
from Tkinter       import *
from tkFileDialog  import asksaveasfilename, SaveAs
from tkMessageBox  import showinfo, showerror, askyesno
from PP2E.Gui.TextEditor.textEditor import TextEditorComponentMinimal

# run if no threads
try:                                     # raise ImportError to
    import thread                        # run with gui blocking
except ImportError:                      # no wait popups appear 
    class fakeThread:
        def start_new_thread(self, func, args):
            apply(func, args)
    thread = fakeThread()

# init global/module vars
msgList       = []                       # list of retrieved emails text
toDelete      = []                       # msgnums to be deleted on exit
listBox       = None                     # main window's scrolled msg list 
rootWin       = None                     # the main window of this program
allModeVar    = None                     # for All mode checkbox value
threadExitVar = 0                        # used to signal child thread exit
debugme       = 0                        # enable extra status messages

mailserver = mailconfig.popservername    # where to read pop email from
mailuser   = mailconfig.popusername      # smtp server in mailconfig too
mailpswd   = None                        # pop passwd via file or popup here
#mailfile  = mailconfig.savemailfile     # from a file select dialog here


def fillIndex(msgList):
    # fill all of main listbox
    listBox.delete(0, END)
    count = 1
    for msg in msgList:
        hdrs = rfc822.Message(StringIO.StringIO(msg))
        msginfo = '%02d' % count
        for key in ('Subject', 'From', 'Date'):
            if hdrs.has_key(key): msginfo = msginfo + ' | ' + hdrs[key][:30]
        listBox.insert(END, msginfo)
        count = count+1
    listBox.see(END)         # show most recent mail=last line 


def selectedMsg():
    # get msg selected in main listbox
    # print listBox.curselection()
    if listBox.curselection() == ():
        return 0                                     # empty tuple:no selection
    else:                                            # else zero-based index
        return eval(listBox.curselection()[0]) + 1   # in a 1-item tuple of str


def waitForThreadExit(win):
    import time
    global threadExitVar          # in main thread, watch shared global var
    delay = 0.0                   # 0.0=no sleep needed on Win98 (but hogs cpu)
    while not threadExitVar:
        win.update()              # dispatch any new GUI events during wait
        time.sleep(delay)         # if needed, sleep so other thread can run
    threadExitVar = 0             # at most one child thread active at once


def busyInfoBoxWait(message):
    # popup wait message box, wait for a thread exit
    # main gui event thread stays alive during wait
    # as coded returns only after thread has finished
    # popup.wait_variable(threadExitVar) may work too

    popup = Toplevel()
    popup.title('PyMail Wait')
    popup.protocol('WM_DELETE_WINDOW', lambda:0)       # ignore deletes    
    label = Label(popup, text=message+'...')
    label.config(height=10, width=40, cursor='watch')  # busy cursor
    label.pack()
    popup.focus_set()                                  # grab application
    popup.grab_set()                                   # wait for thread exit
    waitForThreadExit(popup)                           # gui alive during wait
    print 'thread exit caught'
    popup.destroy() 


def loadMailThread():
    # load mail while main thread handles gui events
    global msgList, errInfo, threadExitVar
    print 'load start'
    errInfo = ''
    try:
        nextnum = len(msgList) + 1
        newmail = pymail.loadmessages(mailserver, mailuser, mailpswd, nextnum)
        msgList = msgList + newmail
    except:
        exc_type, exc_value = sys.exc_info()[:2]                # thread exc
        errInfo = '\n' + str(exc_type) + '\n' + str(exc_value)
    print 'load exit'
    threadExitVar = 1   # signal main thread


def onLoadMail():
    # load all (or new) pop email
    getpassword()
    thread.start_new_thread(loadMailThread, ())
    busyInfoBoxWait('Retrieving mail')
    if errInfo:
        global mailpswd            # zap pswd so can reinput
        mailpswd = None
        showerror('PyMail', 'Error loading mail\n' + errInfo)
    fillIndex(msgList)


def onViewRawMail():
    # view selected message - raw mail text with header lines
    msgnum = selectedMsg()
    if not (1 <= msgnum <= len(msgList)):
        showerror('PyMail', 'No message selected')
    else:
        text = msgList[msgnum-1]                # put in ScrolledText
        from ScrolledText import ScrolledText
        window  = Toplevel()
        window.title('PyMail raw message viewer #' + str(msgnum))
        browser = ScrolledText(window)
        browser.insert('0.0', text)
        browser.pack(expand=YES, fill=BOTH)


def onViewFormatMail():
    # view selected message - popup formatted display
    msgnum = selectedMsg()
    if not (1 <= msgnum <= len(msgList)):
        showerror('PyMail', 'No message selected')
    else:
        mailtext = msgList[msgnum-1]            # put in a TextEditor form
        textfile = StringIO.StringIO(mailtext)
        headers  = rfc822.Message(textfile)     # strips header lines
        bodytext = textfile.read()              # rest is message body
        editmail('View #%d' % msgnum,
                  headers.get('From', '?'), 
                  headers.get('To', '?'), 
                  headers.get('Subject', '?'), 
                  bodytext,
                  headers.get('Cc', '?')) 


# use objects that retain prior directory for the next
# select, instead of simple asksaveasfilename() dialog

saveOneDialog = saveAllDialog = None

def myasksaveasfilename_one():
    global saveOneDialog
    if not saveOneDialog:
        saveOneDialog = SaveAs(title='PyMail Save File')
    return saveOneDialog.show()


def myasksaveasfilename_all():
    global saveAllDialog
    if not saveAllDialog:
        saveAllDialog = SaveAs(title='PyMail Save All File')
    return saveAllDialog.show()


def onSaveMail():
    # save selected message in file
    if allModeVar.get():
        mailfile = myasksaveasfilename_all()
        if mailfile:
            try:
                # maybe this should be a thread
                for i in range(1, len(msgList)+1):
                    pymail.savemessage(i, mailfile, msgList)
            except:
                showerror('PyMail', 'Error during save')
    else:
        msgnum = selectedMsg()
        if not (1 <= msgnum <= len(msgList)):
            showerror('PyMail', 'No message selected')
        else:
            mailfile = myasksaveasfilename_one()
            if mailfile:
                try:
                    pymail.savemessage(msgnum, mailfile, msgList)    
                except:
                    showerror('PyMail', 'Error during save')


def onDeleteMail():
    # mark selected message for deletion on exit
    global toDelete
    if allModeVar.get():
        toDelete = range(1, len(msgList)+1)
    else:
        msgnum = selectedMsg()
        if not (1 <= msgnum <= len(msgList)):
            showerror('PyMail', 'No message selected')
        elif msgnum not in toDelete:
            toDelete.append(msgnum)    # fails if in list twice


def sendMailThread(From, To, Cc, Subj, text):
    # send mail while main thread handles gui events
    global errInfo, threadExitVar
    import smtplib, time
    from mailconfig import smtpservername
    print 'send start'

    date  = time.ctime(time.time())
    Cchdr = (Cc and 'Cc: %s\n' % Cc) or ''
    hdrs  = ('From: %s\nTo: %s\n%sDate: %s\nSubject: %s\n' 
                     % (From, To, Cchdr, date, Subj))
    hdrs  = hdrs + 'X-Mailer: PyMailGui Version 1.0 (Python)\n'

    Ccs = (Cc and string.split(Cc, ';')) or []     # some servers reject ['']
    Tos = string.split(To, ';') + Ccs              # cc: hdr line, and To list
    Tos = map(string.strip, Tos)                   # some addrs can have ','s
    print 'Connecting to mail...', Tos             # strip spaces around addrs

    errInfo = ''
    failed  = {}                                   # smtplib may raise except 
    try:                                           # or return failed Tos dict
        server = smtplib.SMTP(smtpservername) 
        failed = server.sendmail(From, Tos, hdrs + text)
        server.quit()
    except:
        exc_type, exc_value = sys.exc_info()[:2]   # thread exc
        excinfo = '\n' + str(exc_type) + '\n' + str(exc_value)
        errInfo = 'Error sending mail\n' + excinfo
    else:
        if failed: errInfo = 'Failed recipients:\n' + str(failed)

    print 'send exit'
    threadExitVar = 1                              # signal main thread


def sendMail(From, To, Cc, Subj, text):
    # send completed email
    thread.start_new_thread(sendMailThread, (From, To, Cc, Subj, text))
    busyInfoBoxWait('Sending mail') 
    if errInfo:
        showerror('PyMail', errInfo)


def onWriteReplyFwdSend(window, editor, hdrs):
    # mail edit window send button press
    From, To, Cc, Subj = hdrs
    sendtext = editor.getAllText()
    sendMail(From.get(), To.get(), Cc.get(), Subj.get(), sendtext)
    if not errInfo: 
        window.destroy()    # else keep to retry or save   


def editmail(mode, From, To='', Subj='', origtext='', Cc=''):
    # create a new mail edit/view window
    win = Toplevel()
    win.title('PyMail - '+ mode)
    win.iconname('PyMail')
    viewOnly = (mode[:4] == 'View')

    # header entry fields
    frm =  Frame(win); frm.pack( side=TOP,   fill=X)
    lfrm = Frame(frm); lfrm.pack(side=LEFT,  expand=NO,  fill=BOTH)
    mfrm = Frame(frm); mfrm.pack(side=LEFT,  expand=NO,  fill=NONE)
    rfrm = Frame(frm); rfrm.pack(side=RIGHT, expand=YES, fill=BOTH)
    hdrs = []
    for (label, start) in [('From:', From),
                           ('To:',   To),           # order matters on send
                           ('Cc:',   Cc), 
                           ('Subj:', Subj)]:
        lab = Label(mfrm, text=label, justify=LEFT)
        ent = Entry(rfrm)
        lab.pack(side=TOP, expand=YES, fill=X)
        ent.pack(side=TOP, expand=YES, fill=X)
        ent.insert('0', start)
        hdrs.append(ent)

    # send, cancel buttons (need new editor)
    editor = TextEditorComponentMinimal(win)
    sendit = (lambda w=win, e=editor, h=hdrs: onWriteReplyFwdSend(w, e, h))

    for (label, callback) in [('Cancel', win.destroy), ('Send', sendit)]:
        if not (viewOnly and label == 'Send'): 
            b = Button(lfrm, text=label, command=callback)
            b.config(bg='beige', relief=RIDGE, bd=2)
            b.pack(side=TOP, expand=YES, fill=BOTH)

    # body text editor: pack last=clip first
    editor.pack(side=BOTTOM)                         # may be multiple editors
    if (not viewOnly) and mailconfig.mysignature:    # add auto signature text?
        origtext = ('\n%s\n' % mailconfig.mysignature) + origtext
    editor.setAllText(origtext)


def onWriteMail():
    # compose new email
    editmail('Write', From=mailconfig.myaddress)


def quoteorigtext(msgnum):
    origtext = msgList[msgnum-1]
    textfile = StringIO.StringIO(origtext)
    headers  = rfc822.Message(textfile)           # strips header lines
    bodytext = textfile.read()                    # rest is message body
    quoted   = '\n-----Original Message-----\n'
    for hdr in ('From', 'To', 'Subject', 'Date'):
        quoted = quoted + ( '%s: %s\n' % (hdr, headers.get(hdr, '?')) )
    quoted   = quoted + '\n' + bodytext
    quoted   = '\n' + string.replace(quoted, '\n', '\n> ')
    return quoted 


def onReplyMail():
    # reply to selected email
    msgnum = selectedMsg()
    if not (1 <= msgnum <= len(msgList)):
        showerror('PyMail', 'No message selected')
    else:
        text = quoteorigtext(msgnum)
        hdrs = rfc822.Message(StringIO.StringIO(msgList[msgnum-1]))
        toname, toaddr = hdrs.getaddr('From')
        if toname and ',' in toname: toname = '"%s"' % toname
        To   = '%s <%s>' % (toname, toaddr)
        From = mailconfig.myaddress or ('%s <%s>' % hdrs.getaddr('To'))
        Subj = 'Re: ' + hdrs.get('Subject', '(no subject)')
        editmail('Reply', From, To, Subj, text)


def onFwdMail():
    # forward selected email
    msgnum = selectedMsg()
    if not (1 <= msgnum <= len(msgList)):
        showerror('PyMail', 'No message selected')
    else:
        text = quoteorigtext(msgnum)
        hdrs = rfc822.Message(StringIO.StringIO(msgList[msgnum-1]))
        From = mailconfig.myaddress or ('%s <%s>' % hdrs.getaddr('To')) 
        Subj = 'Fwd: ' + hdrs.get('Subject', '(no subject)')
        editmail('Forward', From, '', Subj, text)


def deleteMailThread(toDelete):
    # delete mail while main thread handles gui events
    global errInfo, threadExitVar
    print 'delete start'
    try:
        pymail.deletemessages(mailserver, mailuser, mailpswd, toDelete, 0)
    except:
        exc_type, exc_value = sys.exc_info()[:2]
        errInfo = '\n' + str(exc_type) + '\n' + str(exc_value)
    else:
        errInfo = ''
    print 'delete exit'
    threadExitVar = 1   # signal main thread


def onQuitMail():
    # exit mail tool, delete now
    if askyesno('PyMail', 'Verify Quit?'):
        if toDelete and askyesno('PyMail', 'Really Delete Mail?'):
            getpassword()
            thread.start_new_thread(deleteMailThread, (toDelete,))
            busyInfoBoxWait('Deleting mail')
            if errInfo:
                showerror('PyMail', 'Error while deleting:\n' + errInfo)
            else:
                showinfo('PyMail', 'Mail deleted from server')
        rootWin.quit()


def askpassword(prompt, app='PyMail'):    # getpass.getpass uses stdin, not GUI
    win = Toplevel()                      # tkSimpleDialog.askstring echos input
    win.title(app + ' Prompt')
    Label(win, text=prompt).pack(side=LEFT)
    entvar = StringVar()
    ent = Entry(win, textvariable=entvar, show='*')
    ent.pack(side=RIGHT, expand=YES, fill=X)
    ent.bind('<Return>', lambda event, savewin=win: savewin.destroy())
    ent.focus_set(); win.grab_set(); win.wait_window()
    win.update()
    return entvar.get()    # ent widget is now gone


def getpassword():
    # unless known, set global pop password 
    # from client-side file or popup dialog
    global mailpswd
    if mailpswd:                
        return                  
    else:
        try:
            localfile = open(mailconfig.poppasswdfile)
            mailpswd  = localfile.readline()[:-1]
            if debugme: print 'local file password', repr(mailpswd)
        except:
            prompt    = 'Password for %s on %s?' % (mailuser, mailserver)
            mailpswd  = askpassword(prompt)
            if debugme: print 'user input password', repr(mailpswd)


def decorate(rootWin):
    # window manager stuff for main window
    rootWin.title('PyMail 1.0')
    rootWin.iconname('PyMail')
    rootWin.protocol('WM_DELETE_WINDOW', onQuitMail)


def makemainwindow(parent=None):
    # make the main window
    global rootWin, listBox, allModeVar
    if parent:
        rootWin = Frame(parent)             # attach to a parent
        rootWin.pack(expand=YES, fill=BOTH)
    else:       
        rootWin = Tk()                      # assume I'm standalone
        decorate(rootWin)

    # add main buttons at bottom
    frame1 = Frame(rootWin)
    frame1.pack(side=BOTTOM, fill=X)
    allModeVar = IntVar()
    Checkbutton(frame1, text="All", variable=allModeVar).pack(side=RIGHT)
    actions = [ ('Load',  onLoadMail),  ('View',  onViewFormatMail),
                ('Save',  onSaveMail),  ('Del',   onDeleteMail),
                ('Write', onWriteMail), ('Reply', onReplyMail), 
                ('Fwd',   onFwdMail),   ('Quit',  onQuitMail) ]
    for (title, callback) in actions:
        Button(frame1, text=title, command=callback).pack(side=LEFT, fill=X)

    # add main listbox and scrollbar
    frame2  = Frame(rootWin)
    vscroll = Scrollbar(frame2)
    fontsz  = (sys.platform[:3] == 'win' and 8) or 10
    listBox = Listbox(frame2, bg='white', font=('courier', fontsz))
    
    # crosslink listbox and scrollbar
    vscroll.config(command=listBox.yview, relief=SUNKEN)
    listBox.config(yscrollcommand=vscroll.set, relief=SUNKEN, selectmode=SINGLE)
    listBox.bind('<Double-1>', lambda event: onViewRawMail())
    frame2.pack(side=TOP, expand=YES, fill=BOTH)
    vscroll.pack(side=RIGHT, fill=BOTH)
    listBox.pack(side=LEFT, expand=YES, fill=BOTH)
    return rootWin


# load text block string
from PyMailGuiHelp import helptext

def showhelp(helptext=helptext, appname='PyMail'):   # show helptext in
    from ScrolledText import ScrolledText            # a non-modal dialog 
    new  = Toplevel()                                # make new popup window
    bar  = Frame(new)                                # pack first=clip last
    bar.pack(side=BOTTOM, fill=X)
    code = Button(bar, bg='beige', text="Source", command=showsource)
    quit = Button(bar, bg='beige', text="Cancel", command=new.destroy)
    code.pack(pady=1, side=LEFT)
    quit.pack(pady=1, side=LEFT)
    text = ScrolledText(new)                         # add Text + scrollbar
    text.config(font='system',  width=70)            # too big for showinfo
    text.config(bg='steelblue', fg='white')          # erase on btn or return
    text.insert('0.0', helptext)
    text.pack(expand=YES, fill=BOTH)
    new.title(appname + " Help")
    new.bind("<Return>", (lambda event, new=new: new.destroy()))


def showsource():
    # tricky, but open
    try:                                            # like web getfile.cgi
        source = open('PyMailGui.py').read()        # in cwd or below it?
    except: 
        try:                                        # or use find.find(f)[0],
            import os                               # $PP2EHOME, guessLocation
            from PP2E.Launcher import findFirst     # or spawn pyedit with arg
            here   = os.curdir
            source = open(findFirst(here, 'PyMailGui.py')).read()
        except:
            source = 'Sorry - cannot find my source file'
    subject = 'Main script [see also: PyMailGuiHelp, pymail, mailconfig]'
    editmail('View Source Code', 'PyMailGui', 'User', subject, source)


def container():
    # use attachment to add help button
    # this is a bit easier with classes
    root  = Tk()
    title = Button(root, text='PyMail - a Python/Tkinter email client')
    title.config(bg='steelblue', fg='white', relief=RIDGE)
    title.config(command=showhelp)
    title.pack(fill=X)
    decorate(root)
    return root
    
    
if __name__ == '__main__': 
    # run stand-alone or attach
    rootWin = makemainwindow(container())    # or makemainwindow()
    rootWin.mainloop()

11.5 Other Client-Side Tools

So far in this chapter, we have focused on Python's FTP and email processing tools and have met a handful of client-side scripting modules along the way: ftplib, poplib, smtplib, mhlib, mimetools, urllib, rfc822, and so on. This set is representative of Python's library tools for transferring and processing information over the Internet, but it's not at all complete. A more or less comprehensive list of Python's Internet-related modules appears at the start of the previous chapter. Among other things, Python also includes client-side support libraries for Internet news, Telnet, HTTP, and other standard protocols.

11.5.1 NNTP: Accessing Newsgroups

Python's nntplib module supports the client-side interface to NNTP -- the Network News Transfer Protocol -- which is used for reading and posting articles to Usenet newsgroups in the Internet. Like other protocols, NNTP runs on top of sockets and merely defines a standard message protocol; like other modules, nntplib hides most of the protocol details and presents an object-based interface to Python scripts.

We won't get into protocol details here, but in brief, NNTP servers store a range of articles on the server machine, usually in a flat-file database. If you have the domain or IP name of a server machine that runs an NNTP server program listening on the NNTP port, you can write scripts that fetch or post articles from any machine that has Python and an Internet connection. For instance, the script in Example 11-24 by default fetches and displays the last 10 articles from Python's Internet news group, comp.lang.python, from the news.rmi.net NNTP server at my ISP.

Example 11-24. PP2E\Internet\Other\readnews.py

###############################################
# fetch and print usenet newsgroup postings
# from comp.lang.python via the nntplib module
# which really runs on top of sockets; nntplib 
# also supports posting new messages, etc.;
# note: posts not deleted after they are read;
###############################################

listonly = 0
showhdrs = ['From', 'Subject', 'Date', 'Newsgroups', 'Lines']
try:
    import sys
    servername, groupname, showcount = sys.argv[1:]
    showcount  = int(showcount)
except:
    servername = 'news.rmi.net'
    groupname  = 'comp.lang.python'          # cmd line args or defaults
    showcount  = 10                          # show last showcount posts

# connect to nntp server
print 'Connecting to', servername, 'for', groupname
from nntplib import NNTP
connection = NNTP(servername)
(reply, count, first, last, name) = connection.group(groupname)
print '%s has %s articles: %s-%s' % (name, count, first, last)

# get request headers only
fetchfrom = str(int(last) - (showcount-1))
(reply, subjects) = connection.xhdr('subject', (fetchfrom + '-' + last))

# show headers, get message hdr+body
for (id, subj) in subjects:                  # [-showcount:] if fetch all hdrs
    print 'Article %s [%s]' % (id, subj)
    if not listonly and raw_input('=> Display?') in ['y', 'Y']:
        reply, num, tid, list = connection.head(id)
        for line in list:
            for prefix in showhdrs:
                if line[:len(prefix)] == prefix:
                    print line[:80]; break
        if raw_input('=> Show body?') in ['y', 'Y']:
            reply, num, tid, list = connection.body(id)
            for line in list:
                print line[:80]
    print
print connection.quit()

As for FTP and email tools, the script creates an NNTP object and calls its methods to fetch newsgroup information and articles' header and body text. The xhdr method, for example, loads selected headers from a range of messages. When run, this program connects to the server and displays each article's subject line, pausing to ask whether it should fetch and show the article's header information lines (headers listed in variable showhdrs only) and body text:

C:\...\PP2E\Internet\Other>python readnews.py
Connecting to news.rmi.net for comp.lang.python
comp.lang.python has 3376 articles: 30054-33447
Article 33438 [Embedding? file_input and eval_input]
=> Display?

Article 33439 [Embedding? file_input and eval_input]
=> Display?y
From: James Spears <[email protected]>
Newsgroups: comp.lang.python
Subject: Embedding? file_input and eval_input
Date: Fri, 11 Aug 2000 10:55:39 -0700
Lines: 34
=> Show body?

Article 33440 [Embedding? file_input and eval_input]
=> Display?

Article 33441 [Embedding? file_input and eval_input]
=> Display?

Article 33442 [Embedding? file_input and eval_input]
=> Display?

Article 33443 [Re: PYHTONPATH]
=> Display?y
Subject: Re: PYHTONPATH
Lines: 13
From: sp00fd <[email protected]>
Newsgroups: comp.lang.python
Date: Fri, 11 Aug 2000 11:06:23 -0700
=> Show body?y
Is this not what you were looking for?

Add to cgi script:
import sys
sys.path.insert(0, "/path/to/dir")
import yourmodule

-----------------------------------------------------------
Got questions?  Get answers over the phone at Keen.com.
Up to 100 minutes free!
http://www.keen.com

Article 33444 [Loading new code...]
=> Display?

Article 33445 [Re: PYHTONPATH]
=> Display?

Article 33446 [Re: Compile snags on AIX & IRIX]
=> Display?

Article 33447 [RE: string.replace() can't replace newline characters???]
=> Display?

205 GoodBye

We can also pass this script an explicit server name, newsgroup, and display count on the command line to apply it in different ways. Here is this Python script checking the last few messages in Perl and Linux newsgroups:

C:\...\PP2E\Internet\Other>python readnews.py news.rmi.net comp.lang.perl.misc 5
Connecting to news.rmi.net for comp.lang.perl.misc
comp.lang.perl.misc has 5839 articles: 75543-81512
Article 81508 [Re: Simple Argument Passing Question]
=> Display?

Article 81509 [Re: How to Access a hash value?]
=> Display?

Article 81510 [Re: London =?iso-8859-1?Q?=A330-35K?= Perl Programmers Required]
=> Display?

Article 81511 [Re: ODBC question]
=> Display?

Article 81512 [Re: ODBC question]
=> Display?

205 GoodBye


C:\...\PP2E\Internet\Other>python readnews.py news.rmi.net comp.os.linux 4
Connecting to news.rmi.net for comp.os.linux
comp.os.linux has 526 articles: 9015-9606
Article 9603 [Re: Simple question about CD-Writing for Linux]
=> Display?

Article 9604 [Re: How to start the ftp?]
=> Display?

Article 9605 [Re: large file support]
=> Display?

Article 9606 [Re: large file support]
=> Display?y
From: [email protected] (Andreas Schweitzer)
Newsgroups: comp.os.linux.questions,comp.os.linux.admin,comp.os.linux
Subject: Re: large file support
Date: 11 Aug 2000 18:32:12 GMT
Lines: 19
=> Show body?n

205 GoodBye

With a little more work, we could turn this script into a full-blown news interface. For instance, new articles could be posted from within a Python script with code of this form (assuming the local file already contains proper NNTP header lines):

# to post, say this (but only if you really want to post!)
connection = NNTP(servername)
localfile = open('filename')      # file has proper headers
connection.post(localfile)        # send text to newsgroup
connection.quit()

We might also add a Tkinter-based GUI frontend to this script to make it more usable, but we'll leave such an extension on the suggested exercise heap (see also the PyMailGui interface's suggested extensions in the previous section).

11.5.2 HTTP: Accessing Web Sites

Python's standard library (that is, modules that are installed with the interpreter) also includes client-side support for HTTP -- the Hypertext Transfer Protocol -- a message structure and port standard used to transfer information on the World Wide Web. In short, this is the protocol that your web browser (e.g., Internet Explorer, Netscape) uses to fetch web pages and run applications on remote servers as you surf the Net. At the bottom, it's just bytes sent over port 80.

To really understand HTTP-style transfers, you need to know some of the server-side scripting topics covered in the next three chapters (e.g., script invocations and Internet address schemes), so this section may be less useful to readers with no such background. Luckily, though, the basic HTTP interfaces in Python are simple enough for a cursory understanding even at this point in the book, so let's take a brief look here.

Python's standard httplib module automates much of the protocol defined by HTTP and allows scripts to fetch web pages much like web browsers. For instance, the script in Example 11-25 can be used to grab any file from any server machine running an HTTP web server program. As usual, the file (and descriptive header lines) is ultimately transferred over a standard socket port, but most of the complexity is hidden by the httplib module.

Example 11-25. PP2E\Internet\Other\http-getfile.py

#######################################################################
# fetch a file from an http (web) server over sockets via httplib;
# the filename param may have a full directory path, and may name a cgi
# script with query parameters on the end to invoke a remote program;
# fetched file data or remote program output could be saved to a local
# file to mimic ftp, or parsed with string.find or the htmllib module;
#######################################################################

import sys, httplib
showlines = 6
try:
    servername, filename = sys.argv[1:]           # cmdline args?
except:
    servername, filename = 'starship.python.net', '/index.html'

print servername, filename
server = httplib.HTTP(servername)                 # connect to http site/server
server.putrequest('GET', filename)                # send request and headers
server.putheader('Accept', 'text/html')           # POST requests work here too
server.endheaders()                               # as do cgi script file names 

errcode, errmsh, replyheader = server.getreply()  # read reply info headers
if errcode != 200:                                # 200 means success
    print 'Error sending request', errcode
else:
    file = server.getfile()                       # file obj for data received
    data = file.readlines()
    file.close()                                  # show lines with eoln at end
    for line in data[:showlines]: print line,     # to save, write data to file

Desired server names and filenames can be passed on the command line to override hardcoded defaults in the script. You need to also know something of the HTTP protocol to make the most sense of this code, but it's fairly straightforward to decipher. When run on the client, this script makes a HTTP object to connect to the server, sends it a GET request along with acceptable reply types, and then reads the server's reply. Much like raw email message text, the HTTP server's reply usually begins with a set of descriptive header lines, followed by the contents of the requested file. The HTTP object's getfile method gives us a file object from which we can read the downloaded data.

Let's fetch a few files with this script. Like all Python client-side scripts, this one works on any machine with Python and an Internet connection (here it runs on a Windows client). Assuming that all goes well, the first few lines of the downloaded file are printed; in a more realistic application, the text we fetch would probably be saved to a local file, parsed with Python's htmllib module, and so on. Without arguments, the script simply fetches the HTML index page at http://starship.python.org:

C:\...\PP2E\Internet\Other>python http-getfile.py
starship.python.net /index.html
<HTML>
<HEAD>
  <META NAME="GENERATOR" CONTENT="HTMLgen">
  <TITLE>Starship Python</TITLE>
  <SCRIPT language="JavaScript">
<!-- // mask from the infidel

But we can also list a server and file to be fetched on the command line, if we want to be more specific. In the following code, we use the script to fetch files from two different web sites by listing their names on the command lines (I've added line breaks to make these lines fit in this book). Notice that the filename argument can include an arbitrary remote directory path to the desired file, as in the last fetch here:

C:\...\PP2E\Internet\Other>python http-getfile.py
www.python.org /index.html
www.python.org /index.html
<HTML>
<!-- THIS PAGE IS AUTOMATICALLY GENERATED.  DO NOT EDIT. -->
<!-- Wed Aug 23 17:29:24 2000 -->
<!-- USING HT2HTML 1.1 -->
<!-- SEE http://www.python.org/~bwarsaw/software/pyware.html -->
<!-- User-specified headers:

C:\...\PP2E\Internet\Other>python http-getfile.py www.python.org /index
www.python.org /index
Error sending request 404

C:\...\PP2E\Internet\Other>python http-getfile.py starship.python.net
                                                /~lutz/index.html
starship.python.net /~lutz/index.html
<HTML>
<HEAD><TITLE>Mark Lutz's Starship page</TITLE></HEAD>
<BODY>

<H1>Greetings</H1>

Also notice the second attempt in this code: if the request fails, the script receives and displays an HTTP error code from the server (we forgot the .html on the filename). With the raw HTTP interfaces, we need to be precise about what we want.

Technically, the string we call filename in the script can refer to either a simple static web page file, or a server-side program that generates HTML as its output. Those server-side programs are usually called CGI scripts -- the topic of the next three chapters. For now, keep in mind that when filename refers to a script, this program can be used to invoke another program that resides on a remote server machine. In that case, we can also specify parameters (called a query string) to be passed to the remote program after a ?. Here, for instance, we pass a language=Python parameter to a CGI script we will meet in the next chapter:

C:\...\PP2E\Internet\Other>python http-getfile.py starship.python.net
                             /~lutz/Basics/languages.cgi?language=Python
starship.python.net /~lutz/Basics/languages.cgi?language=Python
<TITLE>Languages</TITLE>
<H1>Syntax</H1><HR>
<H3>Python</H3><P><PRE>
 print 'Hello World'
</PRE></P><BR>
<HR>

This book has much more to say about HTML, CGI scripts, and the meaning of an HTTP GET request (one way to format information sent to a HTTP server) later, so we'll skip additional details here. Suffice it to say, though, that we could use the HTTP interfaces to write our own web browsers and build scripts that use web sites as though they were subroutines. By sending parameters to remote programs and parsing their results, web sites can take on the role of simple in-process functions (albeit, much more slowly and indirectly).

11.5.2.1 urllib revisited

The httplib module we just met provides low-level control for HTTP clients. When dealing with items available on the Web, though, it's often easier to code downloads with Python's standard urllib module introduced in the FTP section of this chapter. Since this module is another way to talk HTTP, let's expand on its interfaces here.

Recall that given a URL, urllib either downloads the requested object over the Net to a local file, or gives us a file-like object from which we can read the requested object's contents. Because of that, the script in Example 11-26 does the same work as the httplib script we just wrote, but requires noticeably less typing.

Example 11-26. PP2E\Internet\Other\http-getfile-urllib1.py

###################################################################
# fetch a file from an http (web) server over sockets via urllib;
# urllib supports http, ftp, files, etc. via url address strings;
# for hhtp, the url can name a file or trigger a remote cgi script;
# see also the urllib example in the ftp section, and the cgi 
# script invocation in a later chapter; files can be fetched over
# the net with Python in many ways that vary in complexity and 
# server requirements: sockets, ftp, http, urllib, cgi outputs;
# caveat: should run urllib.quote on filename--see later chapters;
###################################################################

import sys, urllib
showlines = 6
try:
    servername, filename = sys.argv[1:]              # cmdline args?
except:
    servername, filename = 'starship.python.net', '/index.html'

remoteaddr = 'http://%s%s' % (servername, filename)  # can name a cgi script too
print remoteaddr
remotefile = urllib.urlopen(remoteaddr)              # returns input file object
remotedata = remotefile.readlines()                  # read data directly here
remotefile.close()
for line in remotedata[:showlines]: print line,

Almost all HTTP transfer details are hidden behind the urllib interface here. This version works about the same as the httplib version we wrote first, but builds and submits an Internet URL address to get its work done (the constructed URL is printed as the script's first output line). As we saw in the FTP section of this chapter, the urllib urlopen function returns a file-like object from which we can read the remote data. But because the constructed URLs begin with "http://" here, the urllib module automatically employs the lower-level HTTP interfaces to download the requested file, not FTP:

C:\...\PP2E\Internet\Other>python http-getfile-urllib1.py
http://starship.python.net/index.html
<HTML>
<HEAD>
  <META NAME="GENERATOR" CONTENT="HTMLgen">
  <TITLE>Starship Python</TITLE>
  <SCRIPT language="JavaScript">
<!-- // mask from the infidel

C:\...\PP2E\Internet\Other>python http-getfile-urllib1.py www.python.org /index
http://www.python.org/index
<HTML>
<!-- THIS PAGE IS AUTOMATICALLY GENERATED.  DO NOT EDIT. -->
<!-- Fri Mar  3 10:28:30 2000 -->
<!-- USING HT2HTML 1.1 -->
<!-- SEE http://www.python.org/~bwarsaw/software/pyware.html -->
<!-- User-specified headers:

C:\...\PP2E\Internet\Other>python http-getfile-urllib1.py starship.python.net
                                                   /~lutz/index.html
http://starship.python.net/~lutz/index.html
<HTML>
<HEAD><TITLE>Mark Lutz's Starship page</TITLE></HEAD>
<BODY>

<H1>Greetings</H1>

C:\...\PP2E\Internet\Other>python http-getfile-urllib1.py starship.python.net
                               /~lutz/Basics/languages.cgi?language=Java
http://starship.python.net/~lutz/Basics/languages.cgi?language=Java
<TITLE>Languages</TITLE>
<H1>Syntax</H1><HR>
<H3>Java</H3><P><PRE>
 System.out.println("Hello World");
</PRE></P><BR>
<HR>

As before, the filename argument can name a simple file or a program invocation with optional parameters at the end. If you read this output carefully, you'll notice that this script still works if you leave the .html off the end of a filename (in the second command line); unlike the raw HTTP version, the URL-based interface is smart enough to do the right thing.

11.5.2.2 Other urllib interfaces

One last mutation: the following urllib downloader script uses the slightly higher-level urlretrieve interface in that module to automatically save the downloaded file or script output to a local file on the client machine. This interface is handy if we really mean to store the fetched data (e.g., to mimic the FTP protocol). If we plan on processing the downloaded data immediately, though, this form may be less convenient than the version we just met: we need to open and read the saved file. Moreover, we need to provide extra protocol for specifying or extracting a local filename, as in Example 11-27.

Example 11-27. PP2E\Internet\Other\http-getfile-urllib2.py

####################################################################
# fetch a file from an http (web) server over sockets via urlllib;
# this version uses an interface that saves the fetched data to a
# local file; the local file name is either passed in as a cmdline 
# arg or stripped from the url with urlparse: the filename argument
# may have a directory path at the front and query parmams at end,
# so os.path.split is not enough (only splits off directory path);  
# caveat: should run urllib.quote on filename--see later chapters;
####################################################################

import sys, os, urllib, urlparse
showlines = 6
try:
    servername, filename = sys.argv[1:3]              # first 2 cmdline args?
except:
    servername, filename = 'starship.python.net', '/index.html'

remoteaddr = 'http://%s%s' % (servername, filename)   # any address on the net
if len(sys.argv) == 4:                                # get result file name
    localname = sys.argv[3]
else:
    (scheme, server, path, parms, query, frag) = urlparse.urlparse(remoteaddr)
    localname = os.path.split(path)[1]

print remoteaddr, localname
urllib.urlretrieve(remoteaddr, localname)               # can be file or script
remotedata = open(localname).readlines()                # saved to local file
for line in remotedata[:showlines]: print line,

Let's run this last variant from a command line. Its basic operation is the same as the last two versions: like the prior one, it builds a URL, and like both of the last two, we can list an explicit target server and file path on the command line:

C:\...\PP2E\Internet\Other>python http-getfile-urllib2.py
http://starship.python.net/index.html index.html
<HTML>
<HEAD>
  <META NAME="GENERATOR" CONTENT="HTMLgen">
  <TITLE>Starship Python</TITLE>
  <SCRIPT language="JavaScript">
<!-- // mask from the infidel

C:\...\PP2E\Internet\Other>python http-getfile-urllib2.py 
                                           www.python.org /index.html
http://www.python.org/index.html index.html
<HTML>
<!-- THIS PAGE IS AUTOMATICALLY GENERATED.  DO NOT EDIT. -->
<!-- Wed Aug 23 17:29:24 2000 -->
<!-- USING HT2HTML 1.1 -->
<!-- SEE http://www.python.org/~bwarsaw/software/pyware.html -->
<!-- User-specified headers:

Because this version uses an urllib interface that automatically saves the downloaded data in a local file, it's more directly like FTP downloads in spirit. But this script must also somehow come up with a local filename for storing the data. You can either let the script strip and use the base filename from the constructed URL, or explicitly pass a local filename as a last command-line argument. In the prior run, for instance, the downloaded web page is stored in local file index.html -- the base filename stripped from the URL (the script prints the URL and local filename as its first output line). In the next run, the local filename is passed explicitly as python-org-index.html:

C:\...\PP2E\Internet\Other>python http-getfile-urllib2.py www.python.org 
                                        /index.html python-org-index.html
http://www.python.org/index.html python-org-index.html
<HTML>
<!-- THIS PAGE IS AUTOMATICALLY GENERATED.  DO NOT EDIT. -->
<!-- Wed Aug 23 17:29:24 2000 -->
<!-- USING HT2HTML 1.1 -->
<!-- SEE http://www.python.org/~bwarsaw/software/pyware.html -->
<!-- User-specified headers:

C:\...\PP2E\Internet\Other>python http-getfile-urllib2.py starship.python.net
                                        /~lutz/home/index.html
http://starship.python.net/~lutz/home/index.html index.html
<HTML>

<HEAD>
<TITLE>Mark Lutz's Home Page</TITLE>
</HEAD>


C:\...\PP2E\Internet\Other>python http-getfile-urllib2.py starship.python.net
                                        /~lutz/home/about-pp.html
http://starship.python.net/~lutz/home/about-pp.html about-pp.html
<HTML>

<HEAD>
<TITLE>About "Programming Python"</TITLE>
</HEAD>

Below is a listing showing this third version being used to trigger a remote program. As before, if you don't give the local filename explicitly, the script strips the base filename out of the filename argument. That's not always easy or appropriate for program invocations -- the filename can contain both a remote directory path at the front, and query parameters at the end for a remote program invocation.

Given a script invocation URL and no explicit output filename, the script extracts the base filename in the middle by using first the standard urlparse module to pull out the file path, and then os.path.split to strip off the directory path. However, the resulting filename is a remote script's name, and may or may not be an appropriate place to store the data locally. In the first run below, for example, the script's output goes in a local file called languages.cgi, the script name in the middle of the URL; in the second, we name the output CxxSyntax.html explicitly instead to suppress filename extraction:

C:\...\PP2E\Internet\Other>python http-getfile-urllib2.py starship.python.net
                              /~lutz/Basics/languages.cgi?language=Perl
http://starship.python.net/~lutz/Basics/languages.cgi?language=Perl 
                                                  languages.cgi
<TITLE>Languages</TITLE>
<H1>Syntax</H1><HR>
<H3>Perl</H3><P><PRE>
 print "Hello World\n";
</PRE></P><BR>
<HR>

C:\...\PP2E\Internet\Other>python http-getfile-urllib2.py starship.python.net
                      /~lutz/Basics/languages.cgi?language=C++ CxxSyntax.html
http://starship.python.net/~lutz/Basics/languages.cgi?language=C++ 
                                                  CxxSyntax.html
<TITLE>Languages</TITLE>
<H1>Syntax</H1><HR>
<H3>C  </H3><P><PRE>
Sorry--I don't know that language
</PRE></P><BR>
<HR>

The remote script returns a not-found message when passed "C++" in the last command here. It turns out that "+" is a special character in URL strings (meaning a space), and to be robust, both of the urllib scripts we've just written should really run the filename string though something called urllib.quote , a tool that escapes special characters for transmission. We will talk about this in depth in the next chapter, so consider this all a preview for now. But to make this invocation work, we need to use special sequences in the constructed URL; here's how to do it by hand:

C:\...\PP2E\Internet\Other>python http-getfile-urllib2.py  starship.python.net
               /~lutz/Basics/languages.cgi?language=C%2b%2b CxxSyntax.html
http://starship.python.net/~lutz/Basics/languages.cgi?language=C%2b%2b 
                                                  CxxSyntax.html
<TITLE>Languages</TITLE>
<H1>Syntax</H1><HR>
<H3>C++</H3><P><PRE>
 cout &lt;&lt; "Hello World" &lt;&lt; endl;
</PRE></P><BR>
<HR>

The odd "%2b" strings in this command line are not entirely magical: the escaping required for URLs can be seen by running standard Python tools manually (this is what these scripts should do automatically to handle all possible cases well):

C:\...\PP2E\Internet\Other>python
Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import urllib
>>> urllib.quote('C++')
'C%2b%2b'

Again, don't work too hard at understanding these last few commands; we will revisit URLs and URL escapes in the next chapter, while exploring server-side scripting in Python. I will also explain there why the C++ result came back with other oddities like << -- HTML escapes for <<.

11.5.3 Other Client-Side Scripting Options

In this chapter, we've focused on client-side interfaces to standard protocols that run over sockets, but client-side programming can take other forms, too. For instance, in Chapter 15 we'll also see that Python code can be embedded inside the HTML code that defines a web page, with the Windows Active Scripting extension. When Internet Explorer downloads such a web page file from a web server, the embedded Python scripts are actually executed on the client machine, with an object API that gives access to the browser's context. Code in HTML is downloaded over a socket initially, but its execution is not bound up with a socket-based protocol.

In Chapter 15, we'll also meet client-side options such as the JPython (a.k.a. "Jython") system, a compiler that supports Python-coded Java applets -- general-purpose programs downloaded from a server and run locally on the client when accessed or referenced by a URL. We'll also peek at Python tools for processing XML -- structured text that may become a common language of client/server dialogs in the future.

In deference to time and space, though, we won't go into further details on these and other client-side tools here. If you are interested in using Python to script clients, you should take a few minutes to become familiar with the list of Internet tools documented in the Python library reference manual. All work on similar principles, but have slightly distinct interfaces.

In the next chapter, we'll hop the fence to the other side of the Internet world and explore scripts that run on server machines. Such programs give rise to the grander notion of applications that live entirely on the Web and are launched by web browsers. As we take this leap in structure, keep in mind that the tools we met in this and the previous chapter are often sufficient to implement all the distributed processing that many applications require, and they can work in harmony with scripts that run on a server. To completely understand the web world view, though, we need to explore the server realm, too.

[1]  For more urllib download examples, see the section on HTTP in this chapter. In bigger terms, tools like urllib.urlopen allow scripts to both download remote files and invoke programs that are located on a remote server machine. In Chapter 12, we'll also see that urllib includes tools for formatting (escaping) URL strings for safe transmission.
[2]  This is one point in the class where I also usually threaten to write Guido's home phone number on the whiteboard. But that's generally an empty promise made just for comic effect. If you do want to discuss Python language issues, Guido's email address, as well as contact points for other Python core developers, are readily available on the Net. As someone who's gotten anonymous Python-related calls at home, I never do give out phone numbers (and dialing 1-800-Hi-Guido is only funny the first time).
[3]  Technically, Python's storlines method automatically sends all lines to the server with \r\n line-feed sequences, no matter what it receives from the local file's readline method (\n or \r\n). Because of that, the most important distinctions for uploads are to use the "rb" for binary mode and the storlines method for text. Consult module ftplib.py in the Python source library directory for more details.
[4]  IMAP, or Internet Message Access Protocol, was designed as an alternative to POP, but is not as widely used today, and so is not presented in this text. See the Python library manual for IMAP support details.
[5]  In the process of losing Telnet, my email account and web site were taken down for weeks on end, and I lost forever a backlog of thousands of messages saved over the course of a year. Such outages can be especially bad if your income is largely driven by email and web contacts, but that's a story for another night, boys and girls.
[6]  An extra raw_input is inserted on Windows only, in order to clear the stream damage of the getpass call; see the note about this issue in the FTP section of this chapter.
[7]  Such junk mail is usually referred to as spam, a reference to a Monty Python skit where people trying to order breakfast at a restaurant were repeatedly drowned out by a group of Vikings singing an increasingly loud chorus of "spam, spam, spam,..." (no, really). While spam can be used in many ways, this usage differs both from its appearance in this book's examples, and its much-lauded role as a food product.
[8]  More on POP message numbers when we study PyMailGui later in this chapter. Interestingly, the list of message numbers to be deleted need not be sorted; they remain valid for the duration of the connection.
[9]  I should explain this one: I'm referring to email viruses that appeared in 2000. The short story behind most of them is that Microsoft Outlook sported a "feature" that allowed email attachments to embed and contain executable scripts, and allowed these scripts to gain access to critical computer components when open and run. Furthermore, Outlook had another feature that automatically ran such attached scripts when an email was inspected, whether the attachment was manually opened or not. I'll leave the full weight of such a security hole for you to ponder, but I want to add that if you use Python's attachment tools in any of the mail programs in this book, please do not execute attached programs under any circumstance, unless you also run them with Python's restricted execution mode presented in Chapter 15.
[10]  Example: I added code to pull the POP password from a local file instead of a pop-up in about 10 minutes, and less than 10 lines of code. Of course, I'm familiar with the code, but the wait time for new features in Outlook would be noticeably longer.
[11]  Actually, the help display started life even less fancy: it originally displayed help text in a standard information box pop-up, generated by the Tkinter showinfo call used earlier in the book. This worked fine on Windows (at least with a small amount of help text), but failed on Linux because of a default line-length limit in information pop-up boxes -- lines were broken so badly as to be illegible. The moral: if you're going to use showinfo and care about Linux, be sure to make your lines short and your text strings small.
[12]  If you want to see how this works, change PyMailGui's code such that the fakeThread class near the top of file PyMailGui.py is always defined (by default, it is created only if the import of the thread module fails), and try covering and uncovering the main window during a load, send, or delete operation. The window won't be redrawn because a single-threaded PyMailGui is busy talking over a socket.

CONTENTS